diff mbox series

[drm-misc-next,v3,6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

Message ID 20230909153125.30032-7-dakr@redhat.com (mailing list archive)
State New, archived
Headers show
Series DRM GPUVA Manager GPU-VM features | expand

Commit Message

Danilo Krummrich Sept. 9, 2023, 3:31 p.m. UTC
So far the DRM GPUVA manager offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVA manager
represent a basic GPU-VM implementation. In this context, this patch aims
at generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
   this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
   shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
   GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
   of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

Rather than being designed as a "framework", the target is to make all
features appear as a collection of optional helper functions, such that
drivers are free to make use of the DRM GPUVA managers basic
functionality and opt-in for other features without setting any feature
flags, just by making use of the corresponding functions.

Big kudos to Boris Brezillon for his help to figure out locking for drivers
updating the GPU VA space within the fence signalling path.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/drm_gpuvm.c | 516 ++++++++++++++++++++++++++++++++++++
 include/drm/drm_gpuvm.h     | 197 ++++++++++++++
 2 files changed, 713 insertions(+)

Comments

kernel test robot Sept. 9, 2023, 8:16 p.m. UTC | #1
Hi Danilo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 6bd3d8da51ca1ec97c724016466606aec7739b9f]

url:    https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-gpuvm-rename-struct-drm_gpuva_manager-to-struct-drm_gpuvm/20230909-233346
base:   6bd3d8da51ca1ec97c724016466606aec7739b9f
patch link:    https://lore.kernel.org/r/20230909153125.30032-7-dakr%40redhat.com
patch subject: [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
config: riscv-defconfig (https://download.01.org/0day-ci/archive/20230910/202309100424.uNXGR9d4-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230910/202309100424.uNXGR9d4-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309100424.uNXGR9d4-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/drm_gpuvm.c:734: warning: Function parameter or member '__gpuvm' not described in 'for_each_vm_bo_in_list'
>> drivers/gpu/drm/drm_gpuvm.c:734: warning: Function parameter or member '__list_name' not described in 'for_each_vm_bo_in_list'
>> drivers/gpu/drm/drm_gpuvm.c:734: warning: Function parameter or member '__local_list' not described in 'for_each_vm_bo_in_list'
>> drivers/gpu/drm/drm_gpuvm.c:734: warning: Function parameter or member '__vm_bo' not described in 'for_each_vm_bo_in_list'


vim +734 drivers/gpu/drm/drm_gpuvm.c

    32	
    33	/**
    34	 * DOC: Overview
    35	 *
    36	 * The DRM GPU VA Manager, represented by struct drm_gpuvm keeps track of a
    37	 * GPU's virtual address (VA) space and manages the corresponding virtual
    38	 * mappings represented by &drm_gpuva objects. It also keeps track of the
    39	 * mapping's backing &drm_gem_object buffers.
    40	 *
    41	 * &drm_gem_object buffers maintain a list of &drm_gpuva objects representing
    42	 * all existent GPU VA mappings using this &drm_gem_object as backing buffer.
    43	 *
    44	 * GPU VAs can be flagged as sparse, such that drivers may use GPU VAs to also
    45	 * keep track of sparse PTEs in order to support Vulkan 'Sparse Resources'.
    46	 *
    47	 * The GPU VA manager internally uses a rb-tree to manage the
    48	 * &drm_gpuva mappings within a GPU's virtual address space.
    49	 *
    50	 * The &drm_gpuvm structure contains a special &drm_gpuva representing the
    51	 * portion of VA space reserved by the kernel. This node is initialized together
    52	 * with the GPU VA manager instance and removed when the GPU VA manager is
    53	 * destroyed.
    54	 *
    55	 * In a typical application drivers would embed struct drm_gpuvm and
    56	 * struct drm_gpuva within their own driver specific structures, there won't be
    57	 * any memory allocations of its own nor memory allocations of &drm_gpuva
    58	 * entries.
    59	 *
    60	 * The data structures needed to store &drm_gpuvas within the &drm_gpuvm are
    61	 * contained within struct drm_gpuva already. Hence, for inserting &drm_gpuva
    62	 * entries from within dma-fence signalling critical sections it is enough to
    63	 * pre-allocate the &drm_gpuva structures.
    64	 *
    65	 * In order to connect a struct drm_gpuva its backing &drm_gem_object each
    66	 * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and each
    67	 * &drm_gpuvm_bo contains a list of &&drm_gpuva structures.
    68	 *
    69	 * A &drm_gpuvm_bo is an abstraction that represents a combination of a
    70	 * &drm_gpuvm and a &drm_gem_object. Every such combination should be unique.
    71	 * This is ensured by the API through drm_gpuvm_bo_obtain() and
    72	 * drm_gpuvm_bo_obtain_prealloc() which first look into the corresponding
    73	 * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
    74	 * particular combination. If not existent a new instance is created and linked
    75	 * to the &drm_gem_object.
    76	 *
    77	 * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
    78	 * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
    79	 * list are maintained in order to accelerate locking of dma-resv locks and
    80	 * validation of evicted objects bound in a &drm_gpuvm. For instance the all
    81	 * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
    82	 * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
    83	 * order to validate all evicted &drm_gem_objects. It is also possible to lock
    84	 * additional &drm_gem_objects by providing the corresponding parameters to
    85	 * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
    86	 * use of helper functions such as drm_gpuvm_prepare_range() or
    87	 * drm_gpuvm_prepare_objects().
    88	 *
    89	 * Every bound &drm_gem_object is treated as external object when its &dma_resv
    90	 * structure is different than the &drm_gpuvm's common &dma_resv structure.
    91	 */
    92	
    93	/**
    94	 * DOC: Split and Merge
    95	 *
    96	 * Besides its capability to manage and represent a GPU VA space, the
    97	 * GPU VA manager also provides functions to let the &drm_gpuvm calculate a
    98	 * sequence of operations to satisfy a given map or unmap request.
    99	 *
   100	 * Therefore the DRM GPU VA manager provides an algorithm implementing splitting
   101	 * and merging of existent GPU VA mappings with the ones that are requested to
   102	 * be mapped or unmapped. This feature is required by the Vulkan API to
   103	 * implement Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this
   104	 * as VM BIND.
   105	 *
   106	 * Drivers can call drm_gpuvm_sm_map() to receive a sequence of callbacks
   107	 * containing map, unmap and remap operations for a given newly requested
   108	 * mapping. The sequence of callbacks represents the set of operations to
   109	 * execute in order to integrate the new mapping cleanly into the current state
   110	 * of the GPU VA space.
   111	 *
   112	 * Depending on how the new GPU VA mapping intersects with the existent mappings
   113	 * of the GPU VA space the &drm_gpuvm_ops callbacks contain an arbitrary amount
   114	 * of unmap operations, a maximum of two remap operations and a single map
   115	 * operation. The caller might receive no callback at all if no operation is
   116	 * required, e.g. if the requested mapping already exists in the exact same way.
   117	 *
   118	 * The single map operation represents the original map operation requested by
   119	 * the caller.
   120	 *
   121	 * &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
   122	 * &drm_gpuva to unmap is physically contiguous with the original mapping
   123	 * request. Optionally, if 'keep' is set, drivers may keep the actual page table
   124	 * entries for this &drm_gpuva, adding the missing page table entries only and
   125	 * update the &drm_gpuvm's view of things accordingly.
   126	 *
   127	 * Drivers may do the same optimization, namely delta page table updates, also
   128	 * for remap operations. This is possible since &drm_gpuva_op_remap consists of
   129	 * one unmap operation and one or two map operations, such that drivers can
   130	 * derive the page table update delta accordingly.
   131	 *
   132	 * Note that there can't be more than two existent mappings to split up, one at
   133	 * the beginning and one at the end of the new mapping, hence there is a
   134	 * maximum of two remap operations.
   135	 *
   136	 * Analogous to drm_gpuvm_sm_map() drm_gpuvm_sm_unmap() uses &drm_gpuvm_ops to
   137	 * call back into the driver in order to unmap a range of GPU VA space. The
   138	 * logic behind this function is way simpler though: For all existent mappings
   139	 * enclosed by the given range unmap operations are created. For mappings which
   140	 * are only partically located within the given range, remap operations are
   141	 * created such that those mappings are split up and re-mapped partically.
   142	 *
   143	 * As an alternative to drm_gpuvm_sm_map() and drm_gpuvm_sm_unmap(),
   144	 * drm_gpuvm_sm_map_ops_create() and drm_gpuvm_sm_unmap_ops_create() can be used
   145	 * to directly obtain an instance of struct drm_gpuva_ops containing a list of
   146	 * &drm_gpuva_op, which can be iterated with drm_gpuva_for_each_op(). This list
   147	 * contains the &drm_gpuva_ops analogous to the callbacks one would receive when
   148	 * calling drm_gpuvm_sm_map() or drm_gpuvm_sm_unmap(). While this way requires
   149	 * more memory (to allocate the &drm_gpuva_ops), it provides drivers a way to
   150	 * iterate the &drm_gpuva_op multiple times, e.g. once in a context where memory
   151	 * allocations are possible (e.g. to allocate GPU page tables) and once in the
   152	 * dma-fence signalling critical path.
   153	 *
   154	 * To update the &drm_gpuvm's view of the GPU VA space drm_gpuva_insert() and
   155	 * drm_gpuva_remove() may be used. These functions can safely be used from
   156	 * &drm_gpuvm_ops callbacks originating from drm_gpuvm_sm_map() or
   157	 * drm_gpuvm_sm_unmap(). However, it might be more convenient to use the
   158	 * provided helper functions drm_gpuva_map(), drm_gpuva_remap() and
   159	 * drm_gpuva_unmap() instead.
   160	 *
   161	 * The following diagram depicts the basic relationships of existent GPU VA
   162	 * mappings, a newly requested mapping and the resulting mappings as implemented
   163	 * by drm_gpuvm_sm_map() - it doesn't cover any arbitrary combinations of these.
   164	 *
   165	 * 1) Requested mapping is identical. Replace it, but indicate the backing PTEs
   166	 *    could be kept.
   167	 *
   168	 *    ::
   169	 *
   170	 *	     0     a     1
   171	 *	old: |-----------| (bo_offset=n)
   172	 *
   173	 *	     0     a     1
   174	 *	req: |-----------| (bo_offset=n)
   175	 *
   176	 *	     0     a     1
   177	 *	new: |-----------| (bo_offset=n)
   178	 *
   179	 *
   180	 * 2) Requested mapping is identical, except for the BO offset, hence replace
   181	 *    the mapping.
   182	 *
   183	 *    ::
   184	 *
   185	 *	     0     a     1
   186	 *	old: |-----------| (bo_offset=n)
   187	 *
   188	 *	     0     a     1
   189	 *	req: |-----------| (bo_offset=m)
   190	 *
   191	 *	     0     a     1
   192	 *	new: |-----------| (bo_offset=m)
   193	 *
   194	 *
   195	 * 3) Requested mapping is identical, except for the backing BO, hence replace
   196	 *    the mapping.
   197	 *
   198	 *    ::
   199	 *
   200	 *	     0     a     1
   201	 *	old: |-----------| (bo_offset=n)
   202	 *
   203	 *	     0     b     1
   204	 *	req: |-----------| (bo_offset=n)
   205	 *
   206	 *	     0     b     1
   207	 *	new: |-----------| (bo_offset=n)
   208	 *
   209	 *
   210	 * 4) Existent mapping is a left aligned subset of the requested one, hence
   211	 *    replace the existent one.
   212	 *
   213	 *    ::
   214	 *
   215	 *	     0  a  1
   216	 *	old: |-----|       (bo_offset=n)
   217	 *
   218	 *	     0     a     2
   219	 *	req: |-----------| (bo_offset=n)
   220	 *
   221	 *	     0     a     2
   222	 *	new: |-----------| (bo_offset=n)
   223	 *
   224	 *    .. note::
   225	 *       We expect to see the same result for a request with a different BO
   226	 *       and/or non-contiguous BO offset.
   227	 *
   228	 *
   229	 * 5) Requested mapping's range is a left aligned subset of the existent one,
   230	 *    but backed by a different BO. Hence, map the requested mapping and split
   231	 *    the existent one adjusting its BO offset.
   232	 *
   233	 *    ::
   234	 *
   235	 *	     0     a     2
   236	 *	old: |-----------| (bo_offset=n)
   237	 *
   238	 *	     0  b  1
   239	 *	req: |-----|       (bo_offset=n)
   240	 *
   241	 *	     0  b  1  a' 2
   242	 *	new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
   243	 *
   244	 *    .. note::
   245	 *       We expect to see the same result for a request with a different BO
   246	 *       and/or non-contiguous BO offset.
   247	 *
   248	 *
   249	 * 6) Existent mapping is a superset of the requested mapping. Split it up, but
   250	 *    indicate that the backing PTEs could be kept.
   251	 *
   252	 *    ::
   253	 *
   254	 *	     0     a     2
   255	 *	old: |-----------| (bo_offset=n)
   256	 *
   257	 *	     0  a  1
   258	 *	req: |-----|       (bo_offset=n)
   259	 *
   260	 *	     0  a  1  a' 2
   261	 *	new: |-----|-----| (a.bo_offset=n, a'.bo_offset=n+1)
   262	 *
   263	 *
   264	 * 7) Requested mapping's range is a right aligned subset of the existent one,
   265	 *    but backed by a different BO. Hence, map the requested mapping and split
   266	 *    the existent one, without adjusting the BO offset.
   267	 *
   268	 *    ::
   269	 *
   270	 *	     0     a     2
   271	 *	old: |-----------| (bo_offset=n)
   272	 *
   273	 *	           1  b  2
   274	 *	req:       |-----| (bo_offset=m)
   275	 *
   276	 *	     0  a  1  b  2
   277	 *	new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
   278	 *
   279	 *
   280	 * 8) Existent mapping is a superset of the requested mapping. Split it up, but
   281	 *    indicate that the backing PTEs could be kept.
   282	 *
   283	 *    ::
   284	 *
   285	 *	      0     a     2
   286	 *	old: |-----------| (bo_offset=n)
   287	 *
   288	 *	           1  a  2
   289	 *	req:       |-----| (bo_offset=n+1)
   290	 *
   291	 *	     0  a' 1  a  2
   292	 *	new: |-----|-----| (a'.bo_offset=n, a.bo_offset=n+1)
   293	 *
   294	 *
   295	 * 9) Existent mapping is overlapped at the end by the requested mapping backed
   296	 *    by a different BO. Hence, map the requested mapping and split up the
   297	 *    existent one, without adjusting the BO offset.
   298	 *
   299	 *    ::
   300	 *
   301	 *	     0     a     2
   302	 *	old: |-----------|       (bo_offset=n)
   303	 *
   304	 *	           1     b     3
   305	 *	req:       |-----------| (bo_offset=m)
   306	 *
   307	 *	     0  a  1     b     3
   308	 *	new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
   309	 *
   310	 *
   311	 * 10) Existent mapping is overlapped by the requested mapping, both having the
   312	 *     same backing BO with a contiguous offset. Indicate the backing PTEs of
   313	 *     the old mapping could be kept.
   314	 *
   315	 *     ::
   316	 *
   317	 *	      0     a     2
   318	 *	 old: |-----------|       (bo_offset=n)
   319	 *
   320	 *	            1     a     3
   321	 *	 req:       |-----------| (bo_offset=n+1)
   322	 *
   323	 *	      0  a' 1     a     3
   324	 *	 new: |-----|-----------| (a'.bo_offset=n, a.bo_offset=n+1)
   325	 *
   326	 *
   327	 * 11) Requested mapping's range is a centered subset of the existent one
   328	 *     having a different backing BO. Hence, map the requested mapping and split
   329	 *     up the existent one in two mappings, adjusting the BO offset of the right
   330	 *     one accordingly.
   331	 *
   332	 *     ::
   333	 *
   334	 *	      0        a        3
   335	 *	 old: |-----------------| (bo_offset=n)
   336	 *
   337	 *	            1  b  2
   338	 *	 req:       |-----|       (bo_offset=m)
   339	 *
   340	 *	      0  a  1  b  2  a' 3
   341	 *	 new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
   342	 *
   343	 *
   344	 * 12) Requested mapping is a contiguous subset of the existent one. Split it
   345	 *     up, but indicate that the backing PTEs could be kept.
   346	 *
   347	 *     ::
   348	 *
   349	 *	      0        a        3
   350	 *	 old: |-----------------| (bo_offset=n)
   351	 *
   352	 *	            1  a  2
   353	 *	 req:       |-----|       (bo_offset=n+1)
   354	 *
   355	 *	      0  a' 1  a  2 a'' 3
   356	 *	 old: |-----|-----|-----| (a'.bo_offset=n, a.bo_offset=n+1, a''.bo_offset=n+2)
   357	 *
   358	 *
   359	 * 13) Existent mapping is a right aligned subset of the requested one, hence
   360	 *     replace the existent one.
   361	 *
   362	 *     ::
   363	 *
   364	 *	            1  a  2
   365	 *	 old:       |-----| (bo_offset=n+1)
   366	 *
   367	 *	      0     a     2
   368	 *	 req: |-----------| (bo_offset=n)
   369	 *
   370	 *	      0     a     2
   371	 *	 new: |-----------| (bo_offset=n)
   372	 *
   373	 *     .. note::
   374	 *        We expect to see the same result for a request with a different bo
   375	 *        and/or non-contiguous bo_offset.
   376	 *
   377	 *
   378	 * 14) Existent mapping is a centered subset of the requested one, hence
   379	 *     replace the existent one.
   380	 *
   381	 *     ::
   382	 *
   383	 *	            1  a  2
   384	 *	 old:       |-----| (bo_offset=n+1)
   385	 *
   386	 *	      0        a       3
   387	 *	 req: |----------------| (bo_offset=n)
   388	 *
   389	 *	      0        a       3
   390	 *	 new: |----------------| (bo_offset=n)
   391	 *
   392	 *     .. note::
   393	 *        We expect to see the same result for a request with a different bo
   394	 *        and/or non-contiguous bo_offset.
   395	 *
   396	 *
   397	 * 15) Existent mappings is overlapped at the beginning by the requested mapping
   398	 *     backed by a different BO. Hence, map the requested mapping and split up
   399	 *     the existent one, adjusting its BO offset accordingly.
   400	 *
   401	 *     ::
   402	 *
   403	 *	            1     a     3
   404	 *	 old:       |-----------| (bo_offset=n)
   405	 *
   406	 *	      0     b     2
   407	 *	 req: |-----------|       (bo_offset=m)
   408	 *
   409	 *	      0     b     2  a' 3
   410	 *	 new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
   411	 */
   412	
   413	/**
   414	 * DOC: Locking
   415	 *
   416	 * Generally, the GPU VA manager does not take care of locking itself, it is
   417	 * the drivers responsibility to take care about locking. Drivers might want to
   418	 * protect the following operations: inserting, removing and iterating
   419	 * &drm_gpuva objects as well as generating all kinds of operations, such as
   420	 * split / merge or prefetch.
   421	 *
   422	 * The GPU VA manager also does not take care of the locking of the backing
   423	 * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo abstractions by
   424	 * itself; drivers are responsible to enforce mutual exclusion using either the
   425	 * GEMs dma_resv lock or alternatively a driver specific external lock. For the
   426	 * latter see also drm_gem_gpuva_set_lock().
   427	 *
   428	 * However, the GPU VA manager contains lockdep checks to ensure callers of its
   429	 * API hold the corresponding lock whenever the &drm_gem_objects GPU VA list is
   430	 * accessed by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but
   431	 * also drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put().
   432	 *
   433	 * The latter is required since on creation and destruction of a &drm_gpuvm_bo
   434	 * the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects gpuva list.
   435	 * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
   436	 * &drm_gem_object must be able to observe previous creations and destructions
   437	 * of &drm_gpuvm_bos in order to keep instances unique.
   438	 *
   439	 * The &drm_gpuvm's lists for keeping track of external and evicted objects are
   440	 * protected against concurrent insertion / removal and iteration internally.
   441	 *
   442	 * However, drivers still need ensure to protect concurrent calls to functions
   443	 * iterating those lists, such as drm_gpuvm_validate() and
   444	 * drm_gpuvm_prepare_objects(). Every such function contains a particular
   445	 * comment and lockdep checks if possible.
   446	 *
   447	 * Functions adding or removing entries from those lists, such as
   448	 * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with external
   449	 * locks being held, e.g. in order to avoid the corresponding list to be
   450	 * (safely) modified while potentially being iternated by other API functions.
   451	 * However, this is entirely optional.
   452	 */
   453	
   454	/**
   455	 * DOC: Examples
   456	 *
   457	 * This section gives two examples on how to let the DRM GPUVA Manager generate
   458	 * &drm_gpuva_op in order to satisfy a given map or unmap request and how to
   459	 * make use of them.
   460	 *
   461	 * The below code is strictly limited to illustrate the generic usage pattern.
   462	 * To maintain simplicitly, it doesn't make use of any abstractions for common
   463	 * code, different (asyncronous) stages with fence signalling critical paths,
   464	 * any other helpers or error handling in terms of freeing memory and dropping
   465	 * previously taken locks.
   466	 *
   467	 * 1) Obtain a list of &drm_gpuva_op to create a new mapping::
   468	 *
   469	 *	// Allocates a new &drm_gpuva.
   470	 *	struct drm_gpuva * driver_gpuva_alloc(void);
   471	 *
   472	 *	// Typically drivers would embedd the &drm_gpuvm and &drm_gpuva
   473	 *	// structure in individual driver structures and lock the dma-resv with
   474	 *	// drm_exec or similar helpers.
   475	 *	int driver_mapping_create(struct drm_gpuvm *gpuvm,
   476	 *				  u64 addr, u64 range,
   477	 *				  struct drm_gem_object *obj, u64 offset)
   478	 *	{
   479	 *		struct drm_gpuva_ops *ops;
   480	 *		struct drm_gpuva_op *op
   481	 *		struct drm_gpuvm_bo *vm_bo;
   482	 *
   483	 *		driver_lock_va_space();
   484	 *		ops = drm_gpuvm_sm_map_ops_create(gpuvm, addr, range,
   485	 *						  obj, offset);
   486	 *		if (IS_ERR(ops))
   487	 *			return PTR_ERR(ops);
   488	 *
   489	 *		vm_bo = drm_gpuvm_bo_obtain(gpuvm, obj);
   490	 *		if (IS_ERR(vm_bo))
   491	 *			return PTR_ERR(vm_bo);
   492	 *
   493	 *		drm_gpuva_for_each_op(op, ops) {
   494	 *			struct drm_gpuva *va;
   495	 *
   496	 *			switch (op->op) {
   497	 *			case DRM_GPUVA_OP_MAP:
   498	 *				va = driver_gpuva_alloc();
   499	 *				if (!va)
   500	 *					; // unwind previous VA space updates,
   501	 *					  // free memory and unlock
   502	 *
   503	 *				driver_vm_map();
   504	 *				drm_gpuva_map(gpuvm, va, &op->map);
   505	 *				drm_gpuva_link(va, vm_bo);
   506	 *
   507	 *				break;
   508	 *			case DRM_GPUVA_OP_REMAP: {
   509	 *				struct drm_gpuva *prev = NULL, *next = NULL;
   510	 *
   511	 *				va = op->remap.unmap->va;
   512	 *
   513	 *				if (op->remap.prev) {
   514	 *					prev = driver_gpuva_alloc();
   515	 *					if (!prev)
   516	 *						; // unwind previous VA space
   517	 *						  // updates, free memory and
   518	 *						  // unlock
   519	 *				}
   520	 *
   521	 *				if (op->remap.next) {
   522	 *					next = driver_gpuva_alloc();
   523	 *					if (!next)
   524	 *						; // unwind previous VA space
   525	 *						  // updates, free memory and
   526	 *						  // unlock
   527	 *				}
   528	 *
   529	 *				driver_vm_remap();
   530	 *				drm_gpuva_remap(prev, next, &op->remap);
   531	 *
   532	 *				if (prev)
   533	 *					drm_gpuva_link(prev, va->vm_bo);
   534	 *				if (next)
   535	 *					drm_gpuva_link(next, va->vm_bo);
   536	 *				drm_gpuva_unlink(va);
   537	 *
   538	 *				break;
   539	 *			}
   540	 *			case DRM_GPUVA_OP_UNMAP:
   541	 *				va = op->unmap->va;
   542	 *
   543	 *				driver_vm_unmap();
   544	 *				drm_gpuva_unlink(va);
   545	 *				drm_gpuva_unmap(&op->unmap);
   546	 *
   547	 *				break;
   548	 *			default:
   549	 *				break;
   550	 *			}
   551	 *		}
   552	 *		drm_gpuvm_bo_put(vm_bo);
   553	 *		driver_unlock_va_space();
   554	 *
   555	 *		return 0;
   556	 *	}
   557	 *
   558	 * 2) Receive a callback for each &drm_gpuva_op to create a new mapping::
   559	 *
   560	 *	struct driver_context {
   561	 *		struct drm_gpuvm *gpuvm;
   562	 *		struct drm_gpuvm_bo *vm_bo;
   563	 *		struct drm_gpuva *new_va;
   564	 *		struct drm_gpuva *prev_va;
   565	 *		struct drm_gpuva *next_va;
   566	 *	};
   567	 *
   568	 *	// ops to pass to drm_gpuvm_init()
   569	 *	static const struct drm_gpuvm_ops driver_gpuvm_ops = {
   570	 *		.sm_step_map = driver_gpuva_map,
   571	 *		.sm_step_remap = driver_gpuva_remap,
   572	 *		.sm_step_unmap = driver_gpuva_unmap,
   573	 *	};
   574	 *
   575	 *	// Typically drivers would embedd the &drm_gpuvm and &drm_gpuva
   576	 *	// structure in individual driver structures and lock the dma-resv with
   577	 *	// drm_exec or similar helpers.
   578	 *	int driver_mapping_create(struct drm_gpuvm *gpuvm,
   579	 *				  u64 addr, u64 range,
   580	 *				  struct drm_gem_object *obj, u64 offset)
   581	 *	{
   582	 *		struct driver_context ctx;
   583	 *		struct drm_gpuvm_bo *vm_bo;
   584	 *		struct drm_gpuva_ops *ops;
   585	 *		struct drm_gpuva_op *op;
   586	 *		int ret = 0;
   587	 *
   588	 *		ctx.gpuvm = gpuvm;
   589	 *
   590	 *		ctx.new_va = kzalloc(sizeof(*ctx.new_va), GFP_KERNEL);
   591	 *		ctx.prev_va = kzalloc(sizeof(*ctx.prev_va), GFP_KERNEL);
   592	 *		ctx.next_va = kzalloc(sizeof(*ctx.next_va), GFP_KERNEL);
   593	 *		ctx.vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
   594	 *		if (!ctx.new_va || !ctx.prev_va || !ctx.next_va || !vm_bo) {
   595	 *			ret = -ENOMEM;
   596	 *			goto out;
   597	 *		}
   598	 *
   599	 *		// Typically protected with a driver specific GEM gpuva lock
   600	 *		// used in the fence signaling path for drm_gpuva_link() and
   601	 *		// drm_gpuva_unlink(), hence pre-allocate.
   602	 *		ctx.vm_bo = drm_gpuvm_bo_obtain_prealloc(ctx.vm_bo);
   603	 *
   604	 *		driver_lock_va_space();
   605	 *		ret = drm_gpuvm_sm_map(gpuvm, &ctx, addr, range, obj, offset);
   606	 *		driver_unlock_va_space();
   607	 *
   608	 *	out:
   609	 *		drm_gpuvm_bo_put(ctx.vm_bo);
   610	 *		kfree(ctx.new_va);
   611	 *		kfree(ctx.prev_va);
   612	 *		kfree(ctx.next_va);
   613	 *		return ret;
   614	 *	}
   615	 *
   616	 *	int driver_gpuva_map(struct drm_gpuva_op *op, void *__ctx)
   617	 *	{
   618	 *		struct driver_context *ctx = __ctx;
   619	 *
   620	 *		drm_gpuva_map(ctx->vm, ctx->new_va, &op->map);
   621	 *
   622	 *		drm_gpuva_link(ctx->new_va, ctx->vm_bo);
   623	 *
   624	 *		// prevent the new GPUVA from being freed in
   625	 *		// driver_mapping_create()
   626	 *		ctx->new_va = NULL;
   627	 *
   628	 *		return 0;
   629	 *	}
   630	 *
   631	 *	int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
   632	 *	{
   633	 *		struct driver_context *ctx = __ctx;
   634	 *		struct drm_gpuva *va = op->remap.unmap->va;
   635	 *
   636	 *		drm_gpuva_remap(ctx->prev_va, ctx->next_va, &op->remap);
   637	 *
   638	 *		if (op->remap.prev) {
   639	 *			drm_gpuva_link(ctx->prev_va, va->vm_bo);
   640	 *			ctx->prev_va = NULL;
   641	 *		}
   642	 *
   643	 *		if (op->remap.next) {
   644	 *			drm_gpuva_link(ctx->next_va, va->vm_bo);
   645	 *			ctx->next_va = NULL;
   646	 *		}
   647	 *
   648	 *		drm_gpuva_unlink(va);
   649	 *		kfree(va);
   650	 *
   651	 *		return 0;
   652	 *	}
   653	 *
   654	 *	int driver_gpuva_unmap(struct drm_gpuva_op *op, void *__ctx)
   655	 *	{
   656	 *		drm_gpuva_unlink(op->unmap.va);
   657	 *		drm_gpuva_unmap(&op->unmap);
   658	 *		kfree(op->unmap.va);
   659	 *
   660	 *		return 0;
   661	 *	}
   662	 */
   663	
   664	/**
   665	 * get_next_vm_bo_from_list() - get the next vm_bo element
   666	 * @__gpuvm: The GPU VM
   667	 * @__list_name: The name of the list we're iterating on
   668	 * @__local_list: A pointer to the local list used to store already iterated items
   669	 * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
   670	 *
   671	 * This helper is here to provide lockless list iteration. Lockless as in, the
   672	 * iterator releases the lock immediately after picking the first element from
   673	 * the list, so list insertion deletion can happen concurrently.
   674	 *
   675	 * Elements popped from the original list are kept in a local list, so removal
   676	 * and is_empty checks can still happen while we're iterating the list.
   677	 */
   678	#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
   679		({										\
   680			struct drm_gpuvm_bo *__vm_bo;						\
   681												\
   682			drm_gpuvm_bo_put(__prev_vm_bo);						\
   683												\
   684			spin_lock(&(__gpuvm)->__list_name.lock);				\
   685			while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
   686				__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
   687							   struct drm_gpuvm_bo,			\
   688							   list.entry.__list_name);		\
   689				if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
   690					list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
   691						       __local_list);				\
   692					break;							\
   693				} else {							\
   694					list_del_init(&(__vm_bo)->list.entry.__list_name);	\
   695					__vm_bo = NULL;						\
   696				}								\
   697			}									\
   698			spin_unlock(&(__gpuvm)->__list_name.lock);				\
   699												\
   700			__vm_bo;								\
   701		})
   702	
   703	/**
   704	 * for_each_vm_bo_in_list() - internal vm_bo list iterator
   705	 *
   706	 * This helper is here to provide lockless list iteration. Lockless as in, the
   707	 * iterator releases the lock immediately after picking the first element from the
   708	 * list, so list insertion and deletion can happen concurrently.
   709	 *
   710	 * Typical use:
   711	 *
   712	 *	struct drm_gpuvm_bo *vm_bo;
   713	 *	LIST_HEAD(my_local_list);
   714	 *
   715	 *	ret = 0;
   716	 *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
   717	 *		ret = do_something_with_vm_bo(..., vm_bo);
   718	 *		if (ret)
   719	 *			break;
   720	 *	}
   721	 *	drm_gpuvm_bo_put(vm_bo);
   722	 *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
   723	 *
   724	 *
   725	 * Only used for internal list iterations, not meant to be exposed to the outside
   726	 * world.
   727	 */
   728	#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
   729		for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
   730							__local_list, NULL);		\
   731		     __vm_bo;								\
   732		     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
   733							__local_list, __vm_bo))		\
 > 734
Boris Brezillon Sept. 11, 2023, 10:35 a.m. UTC | #2
Hello Danilo,

On Sat,  9 Sep 2023 17:31:13 +0200
Danilo Krummrich <dakr@redhat.com> wrote:


> @@ -632,6 +661,131 @@
>   *	}
>   */
>  
> +/**
> + * get_next_vm_bo_from_list() - get the next vm_bo element
> + * @__gpuvm: The GPU VM
> + * @__list_name: The name of the list we're iterating on
> + * @__local_list: A pointer to the local list used to store already iterated items
> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from
> + * the list, so list insertion deletion can happen concurrently.
> + *
> + * Elements popped from the original list are kept in a local list, so removal
> + * and is_empty checks can still happen while we're iterating the list.
> + */
> +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
> +	({										\
> +		struct drm_gpuvm_bo *__vm_bo;						\
> +											\
> +		drm_gpuvm_bo_put(__prev_vm_bo);						\
> +											\
> +		spin_lock(&(__gpuvm)->__list_name.lock);				\

I'm tempted to add a drm_gpuvm::<list_name>::local_list field, so we
can catch concurrent iterations with something like:

		if (!(__gpuvm)->__list_name.local_list)
			(__gpuvm)->__list_name.local_list = __local_list;
		else
			WARN_ON((__gpuvm)->__list_name.local_list != __local_list);

with (__gpuvm)->__list_name.local_list being restored to NULL
in restore_vm_bo_list().

> +		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
> +			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
> +						   struct drm_gpuvm_bo,			\
> +						   list.entry.__list_name);		\
> +			if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
> +				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
> +					       __local_list);				\
> +				break;							\
> +			} else {							\
> +				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> +				__vm_bo = NULL;						\
> +			}								\
> +		}									\
> +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> +											\
> +		__vm_bo;								\
> +	})
> +
> +/**
> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from the
> + * list, so list insertion and deletion can happen concurrently.
> + *
> + * Typical use:
> + *
> + *	struct drm_gpuvm_bo *vm_bo;
> + *	LIST_HEAD(my_local_list);
> + *
> + *	ret = 0;
> + *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
> + *		ret = do_something_with_vm_bo(..., vm_bo);
> + *		if (ret)
> + *			break;
> + *	}
> + *	drm_gpuvm_bo_put(vm_bo);
> + *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);

The names in this example and the helper names don't match.

> + *
> + *
> + * Only used for internal list iterations, not meant to be exposed to the outside
> + * world.
> + */
> +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
> +	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> +						__local_list, NULL);		\
> +	     __vm_bo;								\
> +	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> +						__local_list, __vm_bo))		\
> +
> +/**
> + * restore_vm_bo_list() - move vm_bo elements back to their original list
> + * @__gpuvm: The GPU VM
> + * @__list_name: The name of the list we're iterating on
> + * @__local_list: A pointer to the local list used to store already iterated items
> + *
> + * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
> + * to restore the original state and let new iterations take place.
> + */
> +#define restore_vm_bo_list(__gpuvm, __list_name, __local_list)				\
> +	do {										\
> +		/* Merge back the two lists, moving local list elements to the		\
> +		 * head to preserve previous ordering, in case it matters.		\
> +		 */									\
> +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> +		list_splice(__local_list, &(__gpuvm)->__list_name.list);		\
> +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> +	} while (0)
> +/**
> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
> + * @__vm_bo: the &drm_gpuvm_bo
> + * @__list_name: the name of the list to insert into
> + *
> + * Inserts the given @__vm_bo into the list specified by @__list_name and
> + * increases the vm_bo's reference count.
> + */
> +#define drm_gpuvm_bo_list_add(__vm_bo, __list_name)				\
> +	do {									\
> +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> +		if (list_empty(&(__vm_bo)->list.entry.__list_name))		\
> +			list_add_tail(&(__vm_bo)->list.entry.__list_name,	\
> +				      &(__vm_bo)->vm->__list_name.list);	\
> +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> +	} while (0)
> +
> +/**
> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
> + * @__vm_bo: the &drm_gpuvm_bo
> + * @__list_name: the name of the list to insert into
> + *
> + * Removes the given @__vm_bo from the list specified by @__list_name and
> + * decreases the vm_bo's reference count.
> + */
> +#define drm_gpuvm_bo_list_del(__vm_bo, __list_name)				\
> +	do {									\
> +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> +		if (!list_empty(&(__vm_bo)->list.entry.__list_name))		\
> +			list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> +	} while (0)
> +
> +static int __must_check
> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);

I see no obvious reason to have a forward declaration for this helper,
if we decide to keep it, let's at least move the declaration here.


> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>  
>  	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>  
> +	spin_lock(&gpuvm->extobj.lock);
> +	list_del(&vm_bo->list.entry.extobj);
> +	spin_unlock(&gpuvm->extobj.lock);
> +
> +	spin_lock(&gpuvm->evict.lock);
> +	list_del(&vm_bo->list.entry.evict);
> +	spin_unlock(&gpuvm->evict.lock);
> +
>  	list_del(&vm_bo->list.entry.gem);
>  
>  	drm_gem_object_put(obj);
> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>   * @vm_bo: the &drm_gpuvm_bo to release the reference of
>   *
>   * This releases a reference to @vm_bo.
> + *
> + * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
> + * includes removing it from the GEMs gpuva list. Hence, if a call to this
> + * function can potentially let the reference count to zero the caller must
> + * hold the dma-resv or driver specific GEM gpuva lock.

Looks like this should have been part of the previous patch. I hate
the fact we have to worry about GEM gpuva lock being held when we call
_put() only if the ref drops to zero though. I think I'd feel more
comfortable if the function was named differently. Maybe _return() or
_release() to match the _obtain() function, where the object is inserted
in the GEM vm_bo list. I would also do the lock_is_held() check
unconditionally, move the list removal in this function with a del_init(),
and have a WARN_ON(!list_empty) in vm_bo_destroy().

>   */
>  void
>  drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>  }
>  EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>  
> +static int __must_check
> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
> +{
> +	return kref_get_unless_zero(&vm_bo->kref);

Not convinced this helper is needed. It's only used once, and I
don't think we'll need it elsewhere.

> +}
> +
>  static struct drm_gpuvm_bo *
>  __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>  		    struct drm_gem_object *obj)


Regards,

Boris
Boris Brezillon Sept. 11, 2023, 12:54 p.m. UTC | #3
On Sat,  9 Sep 2023 17:31:13 +0200
Danilo Krummrich <dakr@redhat.com> wrote:

> +/**
> + * get_next_vm_bo_from_list() - get the next vm_bo element
> + * @__gpuvm: The GPU VM
> + * @__list_name: The name of the list we're iterating on
> + * @__local_list: A pointer to the local list used to store already iterated items
> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from
> + * the list, so list insertion deletion can happen concurrently.
> + *
> + * Elements popped from the original list are kept in a local list, so removal
> + * and is_empty checks can still happen while we're iterating the list.
> + */
> +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
> +	({										\
> +		struct drm_gpuvm_bo *__vm_bo;						\

Missing NULL assignment here.

> +											\
> +		drm_gpuvm_bo_put(__prev_vm_bo);						\
> +											\
> +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> +		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
> +			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
> +						   struct drm_gpuvm_bo,			\
> +						   list.entry.__list_name);		\
> +			if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
> +				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
> +					       __local_list);				\
> +				break;							\
> +			} else {							\
> +				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> +				__vm_bo = NULL;						\
> +			}								\
> +		}									\
> +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> +											\
> +		__vm_bo;								\
> +	})
> +
> +/**
> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from the
> + * list, so list insertion and deletion can happen concurrently.
> + *
> + * Typical use:
> + *
> + *	struct drm_gpuvm_bo *vm_bo;
> + *	LIST_HEAD(my_local_list);
> + *
> + *	ret = 0;
> + *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
> + *		ret = do_something_with_vm_bo(..., vm_bo);
> + *		if (ret)
> + *			break;
> + *	}
> + *	drm_gpuvm_bo_put(vm_bo);
> + *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);

Might be worth mentioning that the vm_bo pointer shouldn't be
re-assigned from inside for loop, otherwise
the next get_next_vm_bo_from_list() will be passed a wrong prev_vm_bo.

> + *
> + *
> + * Only used for internal list iterations, not meant to be exposed to the outside
> + * world.
> + */
> +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
> +	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> +						__local_list, NULL);		\
> +	     __vm_bo;								\
> +	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> +						__local_list, __vm_bo))		\
Boris Brezillon Sept. 11, 2023, 2:45 p.m. UTC | #4
On Sat,  9 Sep 2023 17:31:13 +0200
Danilo Krummrich <dakr@redhat.com> wrote:

> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>  
>  	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>  
> +	spin_lock(&gpuvm->extobj.lock);
> +	list_del(&vm_bo->list.entry.extobj);
> +	spin_unlock(&gpuvm->extobj.lock);
> +
> +	spin_lock(&gpuvm->evict.lock);
> +	list_del(&vm_bo->list.entry.evict);
> +	spin_unlock(&gpuvm->evict.lock);
> +
>  	list_del(&vm_bo->list.entry.gem);
>  
>  	drm_gem_object_put(obj);

I ran into a UAF situation when the drm_gpuvm_bo object is the last
owner of obj, because the lock that's supposed to be held when calling
this function (drm_gem_gpuva_assert_lock_held() call above), belongs to
obj (either obj->resv, or a driver specific lock that's attached to the
driver-specific GEM object). I worked around it by taking a ref to obj
before calling lock()+drm_gpuvm_bo_put()+unlock(), and releasing it
after I'm node with the lock, but that just feels wrong.
Danilo Krummrich Sept. 11, 2023, 4:23 p.m. UTC | #5
On Mon, Sep 11, 2023 at 12:35:26PM +0200, Boris Brezillon wrote:
> Hello Danilo,
> 
> On Sat,  9 Sep 2023 17:31:13 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
> 
> > @@ -632,6 +661,131 @@
> >   *	}
> >   */
> >  
> > +/**
> > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > + * @__gpuvm: The GPU VM
> > + * @__list_name: The name of the list we're iterating on
> > + * @__local_list: A pointer to the local list used to store already iterated items
> > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> > + *
> > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > + * iterator releases the lock immediately after picking the first element from
> > + * the list, so list insertion deletion can happen concurrently.
> > + *
> > + * Elements popped from the original list are kept in a local list, so removal
> > + * and is_empty checks can still happen while we're iterating the list.
> > + */
> > +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
> > +	({										\
> > +		struct drm_gpuvm_bo *__vm_bo;						\
> > +											\
> > +		drm_gpuvm_bo_put(__prev_vm_bo);						\
> > +											\
> > +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> 
> I'm tempted to add a drm_gpuvm::<list_name>::local_list field, so we
> can catch concurrent iterations with something like:
> 
> 		if (!(__gpuvm)->__list_name.local_list)
> 			(__gpuvm)->__list_name.local_list = __local_list;
> 		else
> 			WARN_ON((__gpuvm)->__list_name.local_list != __local_list);
> 
> with (__gpuvm)->__list_name.local_list being restored to NULL
> in restore_vm_bo_list().
> 
> > +		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
> > +			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
> > +						   struct drm_gpuvm_bo,			\
> > +						   list.entry.__list_name);		\
> > +			if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
> > +				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
> > +					       __local_list);				\
> > +				break;							\
> > +			} else {							\
> > +				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> > +				__vm_bo = NULL;						\
> > +			}								\
> > +		}									\
> > +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> > +											\
> > +		__vm_bo;								\
> > +	})
> > +
> > +/**
> > + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> > + *
> > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > + * iterator releases the lock immediately after picking the first element from the
> > + * list, so list insertion and deletion can happen concurrently.
> > + *
> > + * Typical use:
> > + *
> > + *	struct drm_gpuvm_bo *vm_bo;
> > + *	LIST_HEAD(my_local_list);
> > + *
> > + *	ret = 0;
> > + *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
> > + *		ret = do_something_with_vm_bo(..., vm_bo);
> > + *		if (ret)
> > + *			break;
> > + *	}
> > + *	drm_gpuvm_bo_put(vm_bo);
> > + *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
> 
> The names in this example and the helper names don't match.
> 
> > + *
> > + *
> > + * Only used for internal list iterations, not meant to be exposed to the outside
> > + * world.
> > + */
> > +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
> > +	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> > +						__local_list, NULL);		\
> > +	     __vm_bo;								\
> > +	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> > +						__local_list, __vm_bo))		\
> > +
> > +/**
> > + * restore_vm_bo_list() - move vm_bo elements back to their original list
> > + * @__gpuvm: The GPU VM
> > + * @__list_name: The name of the list we're iterating on
> > + * @__local_list: A pointer to the local list used to store already iterated items
> > + *
> > + * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
> > + * to restore the original state and let new iterations take place.
> > + */
> > +#define restore_vm_bo_list(__gpuvm, __list_name, __local_list)				\
> > +	do {										\
> > +		/* Merge back the two lists, moving local list elements to the		\
> > +		 * head to preserve previous ordering, in case it matters.		\
> > +		 */									\
> > +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> > +		list_splice(__local_list, &(__gpuvm)->__list_name.list);		\
> > +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> > +	} while (0)
> > +/**
> > + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
> > + * @__vm_bo: the &drm_gpuvm_bo
> > + * @__list_name: the name of the list to insert into
> > + *
> > + * Inserts the given @__vm_bo into the list specified by @__list_name and
> > + * increases the vm_bo's reference count.
> > + */
> > +#define drm_gpuvm_bo_list_add(__vm_bo, __list_name)				\
> > +	do {									\
> > +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> > +		if (list_empty(&(__vm_bo)->list.entry.__list_name))		\
> > +			list_add_tail(&(__vm_bo)->list.entry.__list_name,	\
> > +				      &(__vm_bo)->vm->__list_name.list);	\
> > +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> > +	} while (0)
> > +
> > +/**
> > + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
> > + * @__vm_bo: the &drm_gpuvm_bo
> > + * @__list_name: the name of the list to insert into
> > + *
> > + * Removes the given @__vm_bo from the list specified by @__list_name and
> > + * decreases the vm_bo's reference count.
> > + */
> > +#define drm_gpuvm_bo_list_del(__vm_bo, __list_name)				\
> > +	do {									\
> > +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> > +		if (!list_empty(&(__vm_bo)->list.entry.__list_name))		\
> > +			list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> > +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> > +	} while (0)
> > +
> > +static int __must_check
> > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
> 
> I see no obvious reason to have a forward declaration for this helper,
> if we decide to keep it, let's at least move the declaration here.
> 
> 
> > @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> >  
> >  	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
> >  
> > +	spin_lock(&gpuvm->extobj.lock);
> > +	list_del(&vm_bo->list.entry.extobj);
> > +	spin_unlock(&gpuvm->extobj.lock);
> > +
> > +	spin_lock(&gpuvm->evict.lock);
> > +	list_del(&vm_bo->list.entry.evict);
> > +	spin_unlock(&gpuvm->evict.lock);
> > +
> >  	list_del(&vm_bo->list.entry.gem);
> >  
> >  	drm_gem_object_put(obj);
> > @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> >   * @vm_bo: the &drm_gpuvm_bo to release the reference of
> >   *
> >   * This releases a reference to @vm_bo.
> > + *
> > + * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
> > + * includes removing it from the GEMs gpuva list. Hence, if a call to this
> > + * function can potentially let the reference count to zero the caller must
> > + * hold the dma-resv or driver specific GEM gpuva lock.
> 
> Looks like this should have been part of the previous patch. I hate
> the fact we have to worry about GEM gpuva lock being held when we call
> _put() only if the ref drops to zero though. I think I'd feel more
> comfortable if the function was named differently. Maybe _return() or
> _release() to match the _obtain() function, where the object is inserted
> in the GEM vm_bo list. I would also do the lock_is_held() check
> unconditionally, move the list removal in this function with a del_init(),
> and have a WARN_ON(!list_empty) in vm_bo_destroy().
> 

We can't move the list removal to drm_gpuvm_bo_put(), we need to make sure we
can't create duplicate drm_gpuvm_bo structures. Everything else pretty much goes
away with a dedicated GEM gpuva list lock, as I had in my first patch series
when I introduced the GPUVA manager. At that time it wasn't always needed, hence
the optional driver specific lock, however with the VM_BO abstraction it really
makes sense to have a dedicated one.


I agree with the other feedback from this reply and will address it in a V4.

> >   */
> >  void
> >  drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> > @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> >  }
> >  EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
> >  
> > +static int __must_check
> > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
> > +{
> > +	return kref_get_unless_zero(&vm_bo->kref);
> 
> Not convinced this helper is needed. It's only used once, and I
> don't think we'll need it elsewhere.
> 
> > +}
> > +
> >  static struct drm_gpuvm_bo *
> >  __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> >  		    struct drm_gem_object *obj)
> 
> 
> Regards,
> 
> Boris
>
Danilo Krummrich Sept. 11, 2023, 4:30 p.m. UTC | #6
On Mon, Sep 11, 2023 at 04:45:26PM +0200, Boris Brezillon wrote:
> On Sat,  9 Sep 2023 17:31:13 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
> > @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> >  
> >  	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
> >  
> > +	spin_lock(&gpuvm->extobj.lock);
> > +	list_del(&vm_bo->list.entry.extobj);
> > +	spin_unlock(&gpuvm->extobj.lock);
> > +
> > +	spin_lock(&gpuvm->evict.lock);
> > +	list_del(&vm_bo->list.entry.evict);
> > +	spin_unlock(&gpuvm->evict.lock);
> > +
> >  	list_del(&vm_bo->list.entry.gem);
> >  
> >  	drm_gem_object_put(obj);
> 
> I ran into a UAF situation when the drm_gpuvm_bo object is the last
> owner of obj, because the lock that's supposed to be held when calling
> this function (drm_gem_gpuva_assert_lock_held() call above), belongs to
> obj (either obj->resv, or a driver specific lock that's attached to the
> driver-specific GEM object). I worked around it by taking a ref to obj
> before calling lock()+drm_gpuvm_bo_put()+unlock(), and releasing it
> after I'm node with the lock, but that just feels wrong.
> 
As mentioned in a previous reply, I think we want to bring the dedicated GEM
gpuva list lock back instead of abusing the dma-resv lock. This way we can
handle locking internally and don't run into such issues.

There is also no reason for a driver to already hold the GEM gpuva list lock
when when calling drm_gpuvm_bo_put(). Drivers would only acquire the lock to
iterate the GEMs list of drm_gpuvm_bos or the drm_gpuvm_bos list of drm_gpuvas.
And dropping the drm_gpuvm_bo from within such a loop is forbidden anyways.
Thomas Hellstrom Sept. 12, 2023, 4:20 p.m. UTC | #7
Hi, Danilo,

On 9/9/23 17:31, Danilo Krummrich wrote:
> So far the DRM GPUVA manager offers common infrastructure to track GPU VA
> allocations and mappings, generically connect GPU VA mappings to their
> backing buffers and perform more complex mapping operations on the GPU VA
> space.
>
> However, there are more design patterns commonly used by drivers, which
> can potentially be generalized in order to make the DRM GPUVA manager
> represent a basic GPU-VM implementation. In this context, this patch aims
> at generalizing the following elements.
>
> 1) Provide a common dma-resv for GEM objects not being used outside of
>     this GPU-VM.
>
> 2) Provide tracking of external GEM objects (GEM objects which are
>     shared with other GPU-VMs).
>
> 3) Provide functions to efficiently lock all GEM objects dma-resv the
>     GPU-VM contains mappings of.
>
> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
>     of, such that validation of evicted GEM objects is accelerated.
>
> 5) Provide some convinience functions for common patterns.
>
> Rather than being designed as a "framework", the target is to make all
> features appear as a collection of optional helper functions, such that
> drivers are free to make use of the DRM GPUVA managers basic
> functionality and opt-in for other features without setting any feature
> flags, just by making use of the corresponding functions.
>
> Big kudos to Boris Brezillon for his help to figure out locking for drivers
> updating the GPU VA space within the fence signalling path.
>
> Suggested-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>   drivers/gpu/drm/drm_gpuvm.c | 516 ++++++++++++++++++++++++++++++++++++
>   include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>   2 files changed, 713 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index f4411047dbb3..8e62a043f719 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -73,6 +73,21 @@
>    * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
>    * particular combination. If not existent a new instance is created and linked
>    * to the &drm_gem_object.
> + *
> + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
> + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
> + * list are maintained in order to accelerate locking of dma-resv locks and
> + * validation of evicted objects bound in a &drm_gpuvm. For instance the all
> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
> + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
> + * order to validate all evicted &drm_gem_objects. It is also possible to lock
> + * additional &drm_gem_objects by providing the corresponding parameters to
> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
> + * use of helper functions such as drm_gpuvm_prepare_range() or
> + * drm_gpuvm_prepare_objects().
> + *
> + * Every bound &drm_gem_object is treated as external object when its &dma_resv
> + * structure is different than the &drm_gpuvm's common &dma_resv structure.
>    */
>   
>   /**
> @@ -420,6 +435,20 @@
>    * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
>    * &drm_gem_object must be able to observe previous creations and destructions
>    * of &drm_gpuvm_bos in order to keep instances unique.
> + *
> + * The &drm_gpuvm's lists for keeping track of external and evicted objects are
> + * protected against concurrent insertion / removal and iteration internally.
> + *
> + * However, drivers still need ensure to protect concurrent calls to functions
> + * iterating those lists, such as drm_gpuvm_validate() and
> + * drm_gpuvm_prepare_objects(). Every such function contains a particular
> + * comment and lockdep checks if possible.
> + *
> + * Functions adding or removing entries from those lists, such as
> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with external
> + * locks being held, e.g. in order to avoid the corresponding list to be
> + * (safely) modified while potentially being iternated by other API functions.
> + * However, this is entirely optional.
>    */
>   
>   /**
> @@ -632,6 +661,131 @@
>    *	}
>    */
>   
> +/**
> + * get_next_vm_bo_from_list() - get the next vm_bo element
> + * @__gpuvm: The GPU VM
> + * @__list_name: The name of the list we're iterating on
> + * @__local_list: A pointer to the local list used to store already iterated items
> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from
> + * the list, so list insertion deletion can happen concurrently.

Are the list spinlocks needed for that async state update from within 
the dma-fence critical section we've discussed previously?

Otherwise it should be sufficient to protect the lists with the gpuvm's 
resv (or for the extobj list with an outer lock).

If those spinlocks are still needed in some situations, perhaps could we 
have an option to set them to NULL (Like IIRC the maple tree allows for)?

For such drivers, that would require anybody calling unlink to hold the 
vm's resv, though.

It seems that with that also the refcount could be make non-atomic.

All in the spirit of the drm locking guidelines "use big locks when 
possible".
Lower level locks only when necessary for performance or locking inversion?

/Thomas


> + *
> + * Elements popped from the original list are kept in a local list, so removal
> + * and is_empty checks can still happen while we're iterating the list.
> + */
> +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
> +	({										\
> +		struct drm_gpuvm_bo *__vm_bo;						\
> +											\
> +		drm_gpuvm_bo_put(__prev_vm_bo);						\
> +											\
> +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> +		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
> +			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
> +						   struct drm_gpuvm_bo,			\
> +						   list.entry.__list_name);		\
> +			if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
> +				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
> +					       __local_list);				\
> +				break;							\
> +			} else {							\
> +				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> +				__vm_bo = NULL;						\
> +			}								\
> +		}									\
> +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> +											\
> +		__vm_bo;								\
> +	})
> +
> +/**
> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> + *
> + * This helper is here to provide lockless list iteration. Lockless as in, the
> + * iterator releases the lock immediately after picking the first element from the
> + * list, so list insertion and deletion can happen concurrently.
> + *
> + * Typical use:
> + *
> + *	struct drm_gpuvm_bo *vm_bo;
> + *	LIST_HEAD(my_local_list);
> + *
> + *	ret = 0;
> + *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
> + *		ret = do_something_with_vm_bo(..., vm_bo);
> + *		if (ret)
> + *			break;
> + *	}
> + *	drm_gpuvm_bo_put(vm_bo);
> + *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
> + *
> + *
> + * Only used for internal list iterations, not meant to be exposed to the outside
> + * world.
> + */
> +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
> +	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> +						__local_list, NULL);		\
> +	     __vm_bo;								\
> +	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> +						__local_list, __vm_bo))		\
> +
> +/**
> + * restore_vm_bo_list() - move vm_bo elements back to their original list
> + * @__gpuvm: The GPU VM
> + * @__list_name: The name of the list we're iterating on
> + * @__local_list: A pointer to the local list used to store already iterated items
> + *
> + * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
> + * to restore the original state and let new iterations take place.
> + */
> +#define restore_vm_bo_list(__gpuvm, __list_name, __local_list)				\
> +	do {										\
> +		/* Merge back the two lists, moving local list elements to the		\
> +		 * head to preserve previous ordering, in case it matters.		\
> +		 */									\
> +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> +		list_splice(__local_list, &(__gpuvm)->__list_name.list);		\
> +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> +	} while (0)
> +/**
> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
> + * @__vm_bo: the &drm_gpuvm_bo
> + * @__list_name: the name of the list to insert into
> + *
> + * Inserts the given @__vm_bo into the list specified by @__list_name and
> + * increases the vm_bo's reference count.
> + */
> +#define drm_gpuvm_bo_list_add(__vm_bo, __list_name)				\
> +	do {									\
> +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> +		if (list_empty(&(__vm_bo)->list.entry.__list_name))		\
> +			list_add_tail(&(__vm_bo)->list.entry.__list_name,	\
> +				      &(__vm_bo)->vm->__list_name.list);	\
> +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> +	} while (0)
> +
> +/**
> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
> + * @__vm_bo: the &drm_gpuvm_bo
> + * @__list_name: the name of the list to insert into
> + *
> + * Removes the given @__vm_bo from the list specified by @__list_name and
> + * decreases the vm_bo's reference count.
> + */
> +#define drm_gpuvm_bo_list_del(__vm_bo, __list_name)				\
> +	do {									\
> +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> +		if (!list_empty(&(__vm_bo)->list.entry.__list_name))		\
> +			list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> +	} while (0)
> +
> +static int __must_check
> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
> +
>   #define to_drm_gpuva(__node)	container_of((__node), struct drm_gpuva, rb.node)
>   
>   #define GPUVA_START(node) ((node)->va.addr)
> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
>   	gpuvm->rb.tree = RB_ROOT_CACHED;
>   	INIT_LIST_HEAD(&gpuvm->rb.list);
>   
> +	INIT_LIST_HEAD(&gpuvm->extobj.list);
> +	spin_lock_init(&gpuvm->extobj.lock);
> +
> +	INIT_LIST_HEAD(&gpuvm->evict.list);
> +	spin_lock_init(&gpuvm->evict.lock);
> +
>   	drm_gpuva_check_overflow(start_offset, range);
>   	gpuvm->mm_start = start_offset;
>   	gpuvm->mm_range = range;
> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
>   	WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>   	     "GPUVA tree is not empty, potentially leaking memory.\n");
>   
> +	WARN(!list_empty(&gpuvm->extobj.list), "Extobj list should be empty.\n");
> +	WARN(!list_empty(&gpuvm->evict.list), "Evict list should be empty.\n");
> +
>   	drm_gem_private_object_fini(&gpuvm->d_obj);
>   }
>   EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>   
> +/**
> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
> + * @gpuvm: the &drm_gpuvm
> + * @exec: the &drm_exec locking context
> + * @num_fences: the amount of &dma_fences to reserve
> + *
> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
> + * &drm_gpuvm contains mappings of.
> + *
> + * Using this function directly, it is the drivers responsibility to call
> + * drm_exec_init() and drm_exec_fini() accordingly.
> + *
> + * Note: This function is safe against concurrent insertion and removal of
> + * external objects, however it is not safe against concurrent usage itself.
> + *
> + * Drivers need to make sure to protect this case with either an outer VM lock
> + * or by calling drm_gpuvm_prepare_vm() before this function within the
> + * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
> + * mutual exclusion.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> +			  struct drm_exec *exec,
> +			  unsigned int num_fences)
> +{
> +	struct drm_gpuvm_bo *vm_bo;
> +	LIST_HEAD(extobjs);
> +	int ret = 0;
> +
> +	for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
> +		ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
> +		if (ret)
> +			break;
> +	}
> +	/* Drop ref in case we break out of the loop. */
> +	drm_gpuvm_bo_put(vm_bo);
> +	restore_vm_bo_list(gpuvm, extobj, &extobjs);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
> +
> +/**
> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
> + * @gpuvm: the &drm_gpuvm
> + * @exec: the &drm_exec locking context
> + * @addr: the start address within the VA space
> + * @range: the range to iterate within the VA space
> + * @num_fences: the amount of &dma_fences to reserve
> + *
> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
> + * and @addr + @range.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
> +			u64 addr, u64 range, unsigned int num_fences)
> +{
> +	struct drm_gpuva *va;
> +	u64 end = addr + range;
> +	int ret;
> +
> +	drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
> +		struct drm_gem_object *obj = va->gem.obj;
> +
> +		ret = drm_exec_prepare_obj(exec, obj, num_fences);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
> +
> +/**
> + * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
> + * @vm_exec: the &drm_gpuvm_exec abstraction
> + * @num_fences: the amount of &dma_fences to reserve
> + * @interruptible: sleep interruptible if waiting
> + *
> + * Acquires all dma-resv locks of all &drm_gem_objects the given
> + * &drm_gpuvm contains mappings of.
> + *
> + * Addionally, when calling this function with struct drm_gpuvm_exec::extra
> + * being set the driver receives the given @fn callback to lock additional
> + * dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
> + * would call drm_exec_prepare_obj() from within this callback.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> +		    unsigned int num_fences,
> +		    bool interruptible)
> +{
> +	struct drm_gpuvm *gpuvm = vm_exec->vm;
> +	struct drm_exec *exec = &vm_exec->exec;
> +	uint32_t flags;
> +	int ret;
> +
> +	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
> +		DRM_EXEC_IGNORE_DUPLICATES;
> +
> +	drm_exec_init(exec, flags);
> +
> +	drm_exec_until_all_locked(exec) {
> +		ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
> +		drm_exec_retry_on_contention(exec);
> +		if (ret)
> +			goto err;
> +
> +		ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
> +		drm_exec_retry_on_contention(exec);
> +		if (ret)
> +			goto err;
> +
> +		if (vm_exec->extra.fn) {
> +			ret = vm_exec->extra.fn(vm_exec, num_fences);
> +			drm_exec_retry_on_contention(exec);
> +			if (ret)
> +				goto err;
> +		}
> +	}
> +
> +	return 0;
> +
> +err:
> +	drm_exec_fini(exec);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
> +
> +static int
> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
> +{
> +	struct {
> +		struct drm_gem_object **objs;
> +		unsigned int num_objs;
> +	} *args = vm_exec->extra.priv;
> +
> +	return drm_exec_prepare_array(&vm_exec->exec, args->objs,
> +				      args->num_objs, num_fences);
> +}
> +
> +/**
> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
> + * @vm_exec: the &drm_gpuvm_exec abstraction
> + * @objs: additional &drm_gem_objects to lock
> + * @num_objs: the number of additional &drm_gem_objects to lock
> + * @num_fences: the amount of &dma_fences to reserve
> + * @interruptible: sleep interruptible if waiting
> + *
> + * Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
> + * contains mappings of, plus the ones given through @objs.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> +			  struct drm_gem_object **objs,
> +			  unsigned int num_objs,
> +			  unsigned int num_fences,
> +			  bool interruptible)
> +{
> +	struct {
> +		struct drm_gem_object **objs;
> +		unsigned int num_objs;
> +	} args;
> +
> +	args.objs = objs;
> +	args.num_objs = num_objs;
> +
> +	vm_exec->extra.fn = fn_lock_array;
> +	vm_exec->extra.priv = &args;
> +
> +	return drm_gpuvm_exec_lock(vm_exec, num_fences, interruptible);
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
> +
> +/**
> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
> + * @vm_exec: the &drm_gpuvm_exec abstraction
> + * @addr: the start address within the VA space
> + * @range: the range to iterate within the VA space
> + * @num_fences: the amount of &dma_fences to reserve
> + * @interruptible: sleep interruptible if waiting
> + *
> + * Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
> + * @addr + @range.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> +			  u64 addr, u64 range,
> +			  unsigned int num_fences,
> +			  bool interruptible)
> +{
> +	struct drm_gpuvm *gpuvm = vm_exec->vm;
> +	struct drm_exec *exec = &vm_exec->exec;
> +	uint32_t flags;
> +	int ret;
> +
> +	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
> +		DRM_EXEC_IGNORE_DUPLICATES;
> +
> +	drm_exec_init(exec, flags);
> +
> +	drm_exec_until_all_locked(exec) {
> +		ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
> +					      num_fences);
> +		drm_exec_retry_on_contention(exec);
> +		if (ret)
> +			goto err;
> +	}
> +
> +	return ret;
> +
> +err:
> +	drm_exec_fini(exec);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
> +
> +/**
> + * drm_gpuvm_validate() - validate all BOs marked as evicted
> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
> + *
> + * Calls the &drm_gpuvm_ops.bo_validate callback for all evicted buffer
> + * objects being mapped in the given &drm_gpuvm.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +int
> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
> +{
> +	const struct drm_gpuvm_ops *ops = gpuvm->ops;
> +	struct drm_gpuvm_bo *vm_bo;
> +	LIST_HEAD(evict);
> +	int ret = 0;
> +
> +	if (unlikely(!ops || !ops->bo_validate))
> +		return -ENOTSUPP;
> +
> +	for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
> +		dma_resv_assert_held(vm_bo->obj->resv);
> +		ret = ops->bo_validate(vm_bo->obj);
> +		if (ret)
> +			break;
> +	}
> +	/* Drop ref in case we break out of the loop. */
> +	drm_gpuvm_bo_put(vm_bo);
> +	restore_vm_bo_list(gpuvm, evict, &evict);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
> +
> +/**
> + * drm_gpuvm_resv_add_fence - add fence to private and all extobj
> + * dma-resv
> + * @gpuvm: the &drm_gpuvm to add a fence to
> + * @exec: the &drm_exec locking context
> + * @fence: fence to add
> + * @private_usage: private dma-resv usage
> + * @extobj_usage: extobj dma-resv usage
> + */
> +void
> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> +			 struct drm_exec *exec,
> +			 struct dma_fence *fence,
> +			 enum dma_resv_usage private_usage,
> +			 enum dma_resv_usage extobj_usage)
> +{
> +	struct drm_gem_object *obj;
> +	unsigned long index;
> +
> +	drm_exec_for_each_locked_object(exec, index, obj) {
> +		dma_resv_assert_held(obj->resv);
> +		dma_resv_add_fence(obj->resv, fence,
> +				   drm_gpuvm_is_extobj(gpuvm, obj) ?
> +				   private_usage : extobj_usage);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
> +
>   /**
>    * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
>    * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
>   	INIT_LIST_HEAD(&vm_bo->list.gpuva);
>   	INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>   
> +	INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
> +	INIT_LIST_HEAD(&vm_bo->list.entry.evict);
> +
>   	drm_gem_object_get(obj);
>   
>   	return vm_bo;
> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>   
>   	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>   
> +	spin_lock(&gpuvm->extobj.lock);
> +	list_del(&vm_bo->list.entry.extobj);
> +	spin_unlock(&gpuvm->extobj.lock);
> +
> +	spin_lock(&gpuvm->evict.lock);
> +	list_del(&vm_bo->list.entry.evict);
> +	spin_unlock(&gpuvm->evict.lock);
> +
>   	list_del(&vm_bo->list.entry.gem);
>   
>   	drm_gem_object_put(obj);
> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>    * @vm_bo: the &drm_gpuvm_bo to release the reference of
>    *
>    * This releases a reference to @vm_bo.
> + *
> + * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
> + * includes removing it from the GEMs gpuva list. Hence, if a call to this
> + * function can potentially let the reference count to zero the caller must
> + * hold the dma-resv or driver specific GEM gpuva lock.
>    */
>   void
>   drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>   }
>   EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>   
> +static int __must_check
> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
> +{
> +	return kref_get_unless_zero(&vm_bo->kref);
> +}
> +
>   static struct drm_gpuvm_bo *
>   __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>   		    struct drm_gem_object *obj)
> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
>   }
>   EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>   
> +/**
> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
> + * extobj list
> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
> + *
> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
> + * already and if the corresponding &drm_gem_object is an external object,
> + * actually.
> + */
> +void
> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
> +{
> +	struct drm_gpuvm *gpuvm = vm_bo->vm;
> +
> +	if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
> +		drm_gpuvm_bo_list_add(vm_bo, extobj);
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
> +
> +/**
> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a
> + * &drm_gpuvms evicted list
> + * @obj: the &drm_gem_object to add or remove
> + * @evict: indicates whether the object is evicted
> + *
> + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms evicted
> + * list containing a mapping of this &drm_gem_object.
> + */
> +void
> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
> +{
> +	struct drm_gpuvm_bo *vm_bo;
> +
> +	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> +		if (evict)
> +			drm_gpuvm_bo_list_add(vm_bo, evict);
> +		else
> +			drm_gpuvm_bo_list_del(vm_bo, evict);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
> +
>   static int
>   __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>   		   struct drm_gpuva *va)
> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> index afa50b9059a2..834bb6d6617e 100644
> --- a/include/drm/drm_gpuvm.h
> +++ b/include/drm/drm_gpuvm.h
> @@ -26,10 +26,12 @@
>    */
>   
>   #include <linux/list.h>
> +#include <linux/dma-resv.h>
>   #include <linux/rbtree.h>
>   #include <linux/types.h>
>   
>   #include <drm/drm_gem.h>
> +#include <drm/drm_exec.h>
>   
>   struct drm_gpuvm;
>   struct drm_gpuvm_bo;
> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>   	 * space
>   	 */
>   	struct dma_resv *resv;
> +
> +	/**
> +	 * @extobj: structure holding the extobj list
> +	 */
> +	struct {
> +		/**
> +		 * @list: &list_head storing &drm_gpuvm_bos serving as
> +		 * external object
> +		 */
> +		struct list_head list;
> +
> +		/**
> +		 * @lock: spinlock to protect the extobj list
> +		 */
> +		spinlock_t lock;
> +	} extobj;
> +
> +	/**
> +	 * @evict: structure holding the evict list and evict list lock
> +	 */
> +	struct {
> +		/**
> +		 * @list: &list_head storing &drm_gpuvm_bos currently being
> +		 * evicted
> +		 */
> +		struct list_head list;
> +
> +		/**
> +		 * @lock: spinlock to protect the evict list
> +		 */
> +		spinlock_t lock;
> +	} evict;
>   };
>   
>   void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
>   		    const struct drm_gpuvm_ops *ops);
>   void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>   
> +/**
> + * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
> + * external object
> + * @gpuvm: the &drm_gpuvm to check
> + * @obj: the &drm_gem_object to check
> + *
> + * Returns: true if the &drm_gem_object &dma_resv differs from the
> + * &drm_gpuvms &dma_resv, false otherwise
> + */
> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
> +				       struct drm_gem_object *obj)
> +{
> +	return obj && obj->resv != gpuvm->resv;
> +}
> +
>   static inline struct drm_gpuva *
>   __drm_gpuva_next(struct drm_gpuva *va)
>   {
> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>   #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
>   	list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)
>   
> +/**
> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
> + *
> + * This structure should be created on the stack as &drm_exec should be.
> + *
> + * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
> + */
> +struct drm_gpuvm_exec {
> +	/**
> +	 * @exec: the &drm_exec structure
> +	 */
> +	struct drm_exec exec;
> +
> +	/**
> +	 * @vm: the &drm_gpuvm to lock its DMA reservations
> +	 */
> +	struct drm_gpuvm *vm;
> +
> +	/**
> +	 * @extra: Callback and corresponding private data for the driver to
> +	 * lock arbitrary additional &drm_gem_objects.
> +	 */
> +	struct {
> +		/**
> +		 * @fn: The driver callback to lock additional &drm_gem_objects.
> +		 */
> +		int (*fn)(struct drm_gpuvm_exec *vm_exec,
> +			  unsigned int num_fences);
> +
> +		/**
> +		 * @priv: driver private data for the @fn callback
> +		 */
> +		void *priv;
> +	} extra;
> +};
> +
> +/**
> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
> + * @gpuvm: the &drm_gpuvm
> + * @exec: the &drm_exec context
> + * @num_fences: the amount of &dma_fences to reserve
> + *
> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
> + *
> + * Using this function directly, it is the drivers responsibility to call
> + * drm_exec_init() and drm_exec_fini() accordingly.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static inline int
> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
> +		     struct drm_exec *exec,
> +		     unsigned int num_fences)
> +{
> +	return drm_exec_prepare_obj(exec, &gpuvm->d_obj, num_fences);
> +}
> +
> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> +			      struct drm_exec *exec,
> +			      unsigned int num_fences);
> +
> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
> +			    struct drm_exec *exec,
> +			    u64 addr, u64 range,
> +			    unsigned int num_fences);
> +
> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> +			unsigned int num_fences,
> +			bool interruptible);
> +
> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> +			      struct drm_gem_object **objs,
> +			      unsigned int num_objs,
> +			      unsigned int num_fences,
> +			      bool interruptible);
> +
> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> +			      u64 addr, u64 range,
> +			      unsigned int num_fences,
> +			      bool interruptible);
> +
> +/**
> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
> + * @gpuvm: the &drm_gpuvm
> + *
> + * Releases all dma-resv locks of all &drm_gem_objects previously acquired
> + * through drm_gpuvm_lock() or its variants.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static inline void
> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> +{
> +	drm_exec_fini(&vm_exec->exec);
> +}
> +
> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> +			      struct drm_exec *exec,
> +			      struct dma_fence *fence,
> +			      enum dma_resv_usage private_usage,
> +			      enum dma_resv_usage extobj_usage);
> +
> +/**
> + * drm_gpuvm_exec_resv_add_fence()
> + * @vm_exec: the &drm_gpuvm_exec abstraction
> + * @fence: fence to add
> + * @private_usage: private dma-resv usage
> + * @extobj_usage: extobj dma-resv usage
> + *
> + * See drm_gpuvm_resv_add_fence().
> + */
> +static inline void
> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
> +			      struct dma_fence *fence,
> +			      enum dma_resv_usage private_usage,
> +			      enum dma_resv_usage extobj_usage)
> +{
> +	drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
> +				 private_usage, extobj_usage);
> +}
> +
>   /**
>    * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
>    * &drm_gem_object combination
> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>   			 * gpuva list.
>   			 */
>   			struct list_head gem;
> +
> +			/**
> +			 * @evict: List entry to attach to the &drm_gpuvms
> +			 * extobj list.
> +			 */
> +			struct list_head extobj;
> +
> +			/**
> +			 * @evict: List entry to attach to the &drm_gpuvms evict
> +			 * list.
> +			 */
> +			struct list_head evict;
>   		} entry;
>   	} list;
>   };
> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>   drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>   		  struct drm_gem_object *obj);
>   
> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
> +
>   /**
>    * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
>    * @va__: &drm_gpuva structure to assign to in each iteration step
> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>   	 * used.
>   	 */
>   	int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
> +
> +	/**
> +	 * @bo_validate: called from drm_gpuvm_validate()
> +	 *
> +	 * Drivers receive this callback for every evicted &drm_gem_object being
> +	 * mapped in the corresponding &drm_gpuvm.
> +	 *
> +	 * Typically, drivers would call their driver specific variant of
> +	 * ttm_bo_validate() from within this callback.
> +	 */
> +	int (*bo_validate)(struct drm_gem_object *obj);
>   };
>   
>   int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
Danilo Krummrich Sept. 12, 2023, 4:50 p.m. UTC | #8
On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
> Hi, Danilo,
> 
> On 9/9/23 17:31, Danilo Krummrich wrote:
> > So far the DRM GPUVA manager offers common infrastructure to track GPU VA
> > allocations and mappings, generically connect GPU VA mappings to their
> > backing buffers and perform more complex mapping operations on the GPU VA
> > space.
> > 
> > However, there are more design patterns commonly used by drivers, which
> > can potentially be generalized in order to make the DRM GPUVA manager
> > represent a basic GPU-VM implementation. In this context, this patch aims
> > at generalizing the following elements.
> > 
> > 1) Provide a common dma-resv for GEM objects not being used outside of
> >     this GPU-VM.
> > 
> > 2) Provide tracking of external GEM objects (GEM objects which are
> >     shared with other GPU-VMs).
> > 
> > 3) Provide functions to efficiently lock all GEM objects dma-resv the
> >     GPU-VM contains mappings of.
> > 
> > 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
> >     of, such that validation of evicted GEM objects is accelerated.
> > 
> > 5) Provide some convinience functions for common patterns.
> > 
> > Rather than being designed as a "framework", the target is to make all
> > features appear as a collection of optional helper functions, such that
> > drivers are free to make use of the DRM GPUVA managers basic
> > functionality and opt-in for other features without setting any feature
> > flags, just by making use of the corresponding functions.
> > 
> > Big kudos to Boris Brezillon for his help to figure out locking for drivers
> > updating the GPU VA space within the fence signalling path.
> > 
> > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > ---
> >   drivers/gpu/drm/drm_gpuvm.c | 516 ++++++++++++++++++++++++++++++++++++
> >   include/drm/drm_gpuvm.h     | 197 ++++++++++++++
> >   2 files changed, 713 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > index f4411047dbb3..8e62a043f719 100644
> > --- a/drivers/gpu/drm/drm_gpuvm.c
> > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > @@ -73,6 +73,21 @@
> >    * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
> >    * particular combination. If not existent a new instance is created and linked
> >    * to the &drm_gem_object.
> > + *
> > + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
> > + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
> > + * list are maintained in order to accelerate locking of dma-resv locks and
> > + * validation of evicted objects bound in a &drm_gpuvm. For instance the all
> > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
> > + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
> > + * order to validate all evicted &drm_gem_objects. It is also possible to lock
> > + * additional &drm_gem_objects by providing the corresponding parameters to
> > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
> > + * use of helper functions such as drm_gpuvm_prepare_range() or
> > + * drm_gpuvm_prepare_objects().
> > + *
> > + * Every bound &drm_gem_object is treated as external object when its &dma_resv
> > + * structure is different than the &drm_gpuvm's common &dma_resv structure.
> >    */
> >   /**
> > @@ -420,6 +435,20 @@
> >    * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
> >    * &drm_gem_object must be able to observe previous creations and destructions
> >    * of &drm_gpuvm_bos in order to keep instances unique.
> > + *
> > + * The &drm_gpuvm's lists for keeping track of external and evicted objects are
> > + * protected against concurrent insertion / removal and iteration internally.
> > + *
> > + * However, drivers still need ensure to protect concurrent calls to functions
> > + * iterating those lists, such as drm_gpuvm_validate() and
> > + * drm_gpuvm_prepare_objects(). Every such function contains a particular
> > + * comment and lockdep checks if possible.
> > + *
> > + * Functions adding or removing entries from those lists, such as
> > + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with external
> > + * locks being held, e.g. in order to avoid the corresponding list to be
> > + * (safely) modified while potentially being iternated by other API functions.
> > + * However, this is entirely optional.
> >    */
> >   /**
> > @@ -632,6 +661,131 @@
> >    *	}
> >    */
> > +/**
> > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > + * @__gpuvm: The GPU VM
> > + * @__list_name: The name of the list we're iterating on
> > + * @__local_list: A pointer to the local list used to store already iterated items
> > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> > + *
> > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > + * iterator releases the lock immediately after picking the first element from
> > + * the list, so list insertion deletion can happen concurrently.
> 
> Are the list spinlocks needed for that async state update from within the
> dma-fence critical section we've discussed previously?

Yes, but also for other reasons, see below.

> 
> Otherwise it should be sufficient to protect the lists with the gpuvm's resv
> (or for the extobj list with an outer lock).
> 
> If those spinlocks are still needed in some situations, perhaps could we
> have an option to set them to NULL (Like IIRC the maple tree allows for)?

The evict spinlock is needed in any case, since in drm_gpuvm_bo_evict() we're
holding only the dma-resv lock from the BO this function gets called for. Hence,
the spinlock protects concurrent drm_gpuvm_bo_evict() calls with different BOs.

For extobjs an outer lock would be enough in case of Xe, but I really would not
like to add even more complexity just to get the spinlock out of the way in case
the driver already has an outer lock protecting this path.

> 
> For such drivers, that would require anybody calling unlink to hold the vm's
> resv, though.

In V4 I want to go back to having a dedicated lock for the GEMs gpuva list (or
VM_BO list to be more precise). We can't just use the dma-resv lock for that
with VM_BO abstractions, because on destruction of a VM_BO we otherwise wouldn't
be allowed to already hold the dma-resv lock. That's the fix I was referring to
earlier.

> 
> It seems that with that also the refcount could be make non-atomic.
> 
> All in the spirit of the drm locking guidelines "use big locks when
> possible".
> Lower level locks only when necessary for performance or locking inversion?
> 
> /Thomas
> 
> 
> > + *
> > + * Elements popped from the original list are kept in a local list, so removal
> > + * and is_empty checks can still happen while we're iterating the list.
> > + */
> > +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
> > +	({										\
> > +		struct drm_gpuvm_bo *__vm_bo;						\
> > +											\
> > +		drm_gpuvm_bo_put(__prev_vm_bo);						\
> > +											\
> > +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> > +		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
> > +			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
> > +						   struct drm_gpuvm_bo,			\
> > +						   list.entry.__list_name);		\
> > +			if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
> > +				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
> > +					       __local_list);				\
> > +				break;							\
> > +			} else {							\
> > +				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> > +				__vm_bo = NULL;						\
> > +			}								\
> > +		}									\
> > +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> > +											\
> > +		__vm_bo;								\
> > +	})
> > +
> > +/**
> > + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> > + *
> > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > + * iterator releases the lock immediately after picking the first element from the
> > + * list, so list insertion and deletion can happen concurrently.
> > + *
> > + * Typical use:
> > + *
> > + *	struct drm_gpuvm_bo *vm_bo;
> > + *	LIST_HEAD(my_local_list);
> > + *
> > + *	ret = 0;
> > + *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
> > + *		ret = do_something_with_vm_bo(..., vm_bo);
> > + *		if (ret)
> > + *			break;
> > + *	}
> > + *	drm_gpuvm_bo_put(vm_bo);
> > + *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
> > + *
> > + *
> > + * Only used for internal list iterations, not meant to be exposed to the outside
> > + * world.
> > + */
> > +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
> > +	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> > +						__local_list, NULL);		\
> > +	     __vm_bo;								\
> > +	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> > +						__local_list, __vm_bo))		\
> > +
> > +/**
> > + * restore_vm_bo_list() - move vm_bo elements back to their original list
> > + * @__gpuvm: The GPU VM
> > + * @__list_name: The name of the list we're iterating on
> > + * @__local_list: A pointer to the local list used to store already iterated items
> > + *
> > + * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
> > + * to restore the original state and let new iterations take place.
> > + */
> > +#define restore_vm_bo_list(__gpuvm, __list_name, __local_list)				\
> > +	do {										\
> > +		/* Merge back the two lists, moving local list elements to the		\
> > +		 * head to preserve previous ordering, in case it matters.		\
> > +		 */									\
> > +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> > +		list_splice(__local_list, &(__gpuvm)->__list_name.list);		\
> > +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> > +	} while (0)
> > +/**
> > + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
> > + * @__vm_bo: the &drm_gpuvm_bo
> > + * @__list_name: the name of the list to insert into
> > + *
> > + * Inserts the given @__vm_bo into the list specified by @__list_name and
> > + * increases the vm_bo's reference count.
> > + */
> > +#define drm_gpuvm_bo_list_add(__vm_bo, __list_name)				\
> > +	do {									\
> > +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> > +		if (list_empty(&(__vm_bo)->list.entry.__list_name))		\
> > +			list_add_tail(&(__vm_bo)->list.entry.__list_name,	\
> > +				      &(__vm_bo)->vm->__list_name.list);	\
> > +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> > +	} while (0)
> > +
> > +/**
> > + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
> > + * @__vm_bo: the &drm_gpuvm_bo
> > + * @__list_name: the name of the list to insert into
> > + *
> > + * Removes the given @__vm_bo from the list specified by @__list_name and
> > + * decreases the vm_bo's reference count.
> > + */
> > +#define drm_gpuvm_bo_list_del(__vm_bo, __list_name)				\
> > +	do {									\
> > +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> > +		if (!list_empty(&(__vm_bo)->list.entry.__list_name))		\
> > +			list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> > +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> > +	} while (0)
> > +
> > +static int __must_check
> > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
> > +
> >   #define to_drm_gpuva(__node)	container_of((__node), struct drm_gpuva, rb.node)
> >   #define GPUVA_START(node) ((node)->va.addr)
> > @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> >   	gpuvm->rb.tree = RB_ROOT_CACHED;
> >   	INIT_LIST_HEAD(&gpuvm->rb.list);
> > +	INIT_LIST_HEAD(&gpuvm->extobj.list);
> > +	spin_lock_init(&gpuvm->extobj.lock);
> > +
> > +	INIT_LIST_HEAD(&gpuvm->evict.list);
> > +	spin_lock_init(&gpuvm->evict.lock);
> > +
> >   	drm_gpuva_check_overflow(start_offset, range);
> >   	gpuvm->mm_start = start_offset;
> >   	gpuvm->mm_range = range;
> > @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
> >   	WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
> >   	     "GPUVA tree is not empty, potentially leaking memory.\n");
> > +	WARN(!list_empty(&gpuvm->extobj.list), "Extobj list should be empty.\n");
> > +	WARN(!list_empty(&gpuvm->evict.list), "Evict list should be empty.\n");
> > +
> >   	drm_gem_private_object_fini(&gpuvm->d_obj);
> >   }
> >   EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
> > +/**
> > + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
> > + * @gpuvm: the &drm_gpuvm
> > + * @exec: the &drm_exec locking context
> > + * @num_fences: the amount of &dma_fences to reserve
> > + *
> > + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
> > + * &drm_gpuvm contains mappings of.
> > + *
> > + * Using this function directly, it is the drivers responsibility to call
> > + * drm_exec_init() and drm_exec_fini() accordingly.
> > + *
> > + * Note: This function is safe against concurrent insertion and removal of
> > + * external objects, however it is not safe against concurrent usage itself.
> > + *
> > + * Drivers need to make sure to protect this case with either an outer VM lock
> > + * or by calling drm_gpuvm_prepare_vm() before this function within the
> > + * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
> > + * mutual exclusion.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +int
> > +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > +			  struct drm_exec *exec,
> > +			  unsigned int num_fences)
> > +{
> > +	struct drm_gpuvm_bo *vm_bo;
> > +	LIST_HEAD(extobjs);
> > +	int ret = 0;
> > +
> > +	for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
> > +		ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
> > +		if (ret)
> > +			break;
> > +	}
> > +	/* Drop ref in case we break out of the loop. */
> > +	drm_gpuvm_bo_put(vm_bo);
> > +	restore_vm_bo_list(gpuvm, extobj, &extobjs);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
> > +
> > +/**
> > + * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
> > + * @gpuvm: the &drm_gpuvm
> > + * @exec: the &drm_exec locking context
> > + * @addr: the start address within the VA space
> > + * @range: the range to iterate within the VA space
> > + * @num_fences: the amount of &dma_fences to reserve
> > + *
> > + * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
> > + * and @addr + @range.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +int
> > +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
> > +			u64 addr, u64 range, unsigned int num_fences)
> > +{
> > +	struct drm_gpuva *va;
> > +	u64 end = addr + range;
> > +	int ret;
> > +
> > +	drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
> > +		struct drm_gem_object *obj = va->gem.obj;
> > +
> > +		ret = drm_exec_prepare_obj(exec, obj, num_fences);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
> > +
> > +/**
> > + * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
> > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > + * @num_fences: the amount of &dma_fences to reserve
> > + * @interruptible: sleep interruptible if waiting
> > + *
> > + * Acquires all dma-resv locks of all &drm_gem_objects the given
> > + * &drm_gpuvm contains mappings of.
> > + *
> > + * Addionally, when calling this function with struct drm_gpuvm_exec::extra
> > + * being set the driver receives the given @fn callback to lock additional
> > + * dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
> > + * would call drm_exec_prepare_obj() from within this callback.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +int
> > +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > +		    unsigned int num_fences,
> > +		    bool interruptible)
> > +{
> > +	struct drm_gpuvm *gpuvm = vm_exec->vm;
> > +	struct drm_exec *exec = &vm_exec->exec;
> > +	uint32_t flags;
> > +	int ret;
> > +
> > +	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
> > +		DRM_EXEC_IGNORE_DUPLICATES;
> > +
> > +	drm_exec_init(exec, flags);
> > +
> > +	drm_exec_until_all_locked(exec) {
> > +		ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
> > +		drm_exec_retry_on_contention(exec);
> > +		if (ret)
> > +			goto err;
> > +
> > +		ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
> > +		drm_exec_retry_on_contention(exec);
> > +		if (ret)
> > +			goto err;
> > +
> > +		if (vm_exec->extra.fn) {
> > +			ret = vm_exec->extra.fn(vm_exec, num_fences);
> > +			drm_exec_retry_on_contention(exec);
> > +			if (ret)
> > +				goto err;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +
> > +err:
> > +	drm_exec_fini(exec);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
> > +
> > +static int
> > +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
> > +{
> > +	struct {
> > +		struct drm_gem_object **objs;
> > +		unsigned int num_objs;
> > +	} *args = vm_exec->extra.priv;
> > +
> > +	return drm_exec_prepare_array(&vm_exec->exec, args->objs,
> > +				      args->num_objs, num_fences);
> > +}
> > +
> > +/**
> > + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
> > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > + * @objs: additional &drm_gem_objects to lock
> > + * @num_objs: the number of additional &drm_gem_objects to lock
> > + * @num_fences: the amount of &dma_fences to reserve
> > + * @interruptible: sleep interruptible if waiting
> > + *
> > + * Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
> > + * contains mappings of, plus the ones given through @objs.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +int
> > +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> > +			  struct drm_gem_object **objs,
> > +			  unsigned int num_objs,
> > +			  unsigned int num_fences,
> > +			  bool interruptible)
> > +{
> > +	struct {
> > +		struct drm_gem_object **objs;
> > +		unsigned int num_objs;
> > +	} args;
> > +
> > +	args.objs = objs;
> > +	args.num_objs = num_objs;
> > +
> > +	vm_exec->extra.fn = fn_lock_array;
> > +	vm_exec->extra.priv = &args;
> > +
> > +	return drm_gpuvm_exec_lock(vm_exec, num_fences, interruptible);
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
> > +
> > +/**
> > + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
> > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > + * @addr: the start address within the VA space
> > + * @range: the range to iterate within the VA space
> > + * @num_fences: the amount of &dma_fences to reserve
> > + * @interruptible: sleep interruptible if waiting
> > + *
> > + * Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
> > + * @addr + @range.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +int
> > +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> > +			  u64 addr, u64 range,
> > +			  unsigned int num_fences,
> > +			  bool interruptible)
> > +{
> > +	struct drm_gpuvm *gpuvm = vm_exec->vm;
> > +	struct drm_exec *exec = &vm_exec->exec;
> > +	uint32_t flags;
> > +	int ret;
> > +
> > +	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
> > +		DRM_EXEC_IGNORE_DUPLICATES;
> > +
> > +	drm_exec_init(exec, flags);
> > +
> > +	drm_exec_until_all_locked(exec) {
> > +		ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
> > +					      num_fences);
> > +		drm_exec_retry_on_contention(exec);
> > +		if (ret)
> > +			goto err;
> > +	}
> > +
> > +	return ret;
> > +
> > +err:
> > +	drm_exec_fini(exec);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
> > +
> > +/**
> > + * drm_gpuvm_validate() - validate all BOs marked as evicted
> > + * @gpuvm: the &drm_gpuvm to validate evicted BOs
> > + *
> > + * Calls the &drm_gpuvm_ops.bo_validate callback for all evicted buffer
> > + * objects being mapped in the given &drm_gpuvm.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +int
> > +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
> > +{
> > +	const struct drm_gpuvm_ops *ops = gpuvm->ops;
> > +	struct drm_gpuvm_bo *vm_bo;
> > +	LIST_HEAD(evict);
> > +	int ret = 0;
> > +
> > +	if (unlikely(!ops || !ops->bo_validate))
> > +		return -ENOTSUPP;
> > +
> > +	for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
> > +		dma_resv_assert_held(vm_bo->obj->resv);
> > +		ret = ops->bo_validate(vm_bo->obj);
> > +		if (ret)
> > +			break;
> > +	}
> > +	/* Drop ref in case we break out of the loop. */
> > +	drm_gpuvm_bo_put(vm_bo);
> > +	restore_vm_bo_list(gpuvm, evict, &evict);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
> > +
> > +/**
> > + * drm_gpuvm_resv_add_fence - add fence to private and all extobj
> > + * dma-resv
> > + * @gpuvm: the &drm_gpuvm to add a fence to
> > + * @exec: the &drm_exec locking context
> > + * @fence: fence to add
> > + * @private_usage: private dma-resv usage
> > + * @extobj_usage: extobj dma-resv usage
> > + */
> > +void
> > +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > +			 struct drm_exec *exec,
> > +			 struct dma_fence *fence,
> > +			 enum dma_resv_usage private_usage,
> > +			 enum dma_resv_usage extobj_usage)
> > +{
> > +	struct drm_gem_object *obj;
> > +	unsigned long index;
> > +
> > +	drm_exec_for_each_locked_object(exec, index, obj) {
> > +		dma_resv_assert_held(obj->resv);
> > +		dma_resv_add_fence(obj->resv, fence,
> > +				   drm_gpuvm_is_extobj(gpuvm, obj) ?
> > +				   private_usage : extobj_usage);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
> > +
> >   /**
> >    * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
> >    * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> > @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
> >   	INIT_LIST_HEAD(&vm_bo->list.gpuva);
> >   	INIT_LIST_HEAD(&vm_bo->list.entry.gem);
> > +	INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
> > +	INIT_LIST_HEAD(&vm_bo->list.entry.evict);
> > +
> >   	drm_gem_object_get(obj);
> >   	return vm_bo;
> > @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> >   	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
> > +	spin_lock(&gpuvm->extobj.lock);
> > +	list_del(&vm_bo->list.entry.extobj);
> > +	spin_unlock(&gpuvm->extobj.lock);
> > +
> > +	spin_lock(&gpuvm->evict.lock);
> > +	list_del(&vm_bo->list.entry.evict);
> > +	spin_unlock(&gpuvm->evict.lock);
> > +
> >   	list_del(&vm_bo->list.entry.gem);
> >   	drm_gem_object_put(obj);
> > @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> >    * @vm_bo: the &drm_gpuvm_bo to release the reference of
> >    *
> >    * This releases a reference to @vm_bo.
> > + *
> > + * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
> > + * includes removing it from the GEMs gpuva list. Hence, if a call to this
> > + * function can potentially let the reference count to zero the caller must
> > + * hold the dma-resv or driver specific GEM gpuva lock.
> >    */
> >   void
> >   drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> > @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> >   }
> >   EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
> > +static int __must_check
> > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
> > +{
> > +	return kref_get_unless_zero(&vm_bo->kref);
> > +}
> > +
> >   static struct drm_gpuvm_bo *
> >   __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> >   		    struct drm_gem_object *obj)
> > @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
> >   }
> >   EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
> > +/**
> > + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
> > + * extobj list
> > + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
> > + *
> > + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
> > + * already and if the corresponding &drm_gem_object is an external object,
> > + * actually.
> > + */
> > +void
> > +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
> > +{
> > +	struct drm_gpuvm *gpuvm = vm_bo->vm;
> > +
> > +	if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
> > +		drm_gpuvm_bo_list_add(vm_bo, extobj);
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
> > +
> > +/**
> > + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a
> > + * &drm_gpuvms evicted list
> > + * @obj: the &drm_gem_object to add or remove
> > + * @evict: indicates whether the object is evicted
> > + *
> > + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms evicted
> > + * list containing a mapping of this &drm_gem_object.
> > + */
> > +void
> > +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
> > +{
> > +	struct drm_gpuvm_bo *vm_bo;
> > +
> > +	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> > +		if (evict)
> > +			drm_gpuvm_bo_list_add(vm_bo, evict);
> > +		else
> > +			drm_gpuvm_bo_list_del(vm_bo, evict);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
> > +
> >   static int
> >   __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
> >   		   struct drm_gpuva *va)
> > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> > index afa50b9059a2..834bb6d6617e 100644
> > --- a/include/drm/drm_gpuvm.h
> > +++ b/include/drm/drm_gpuvm.h
> > @@ -26,10 +26,12 @@
> >    */
> >   #include <linux/list.h>
> > +#include <linux/dma-resv.h>
> >   #include <linux/rbtree.h>
> >   #include <linux/types.h>
> >   #include <drm/drm_gem.h>
> > +#include <drm/drm_exec.h>
> >   struct drm_gpuvm;
> >   struct drm_gpuvm_bo;
> > @@ -259,6 +261,38 @@ struct drm_gpuvm {
> >   	 * space
> >   	 */
> >   	struct dma_resv *resv;
> > +
> > +	/**
> > +	 * @extobj: structure holding the extobj list
> > +	 */
> > +	struct {
> > +		/**
> > +		 * @list: &list_head storing &drm_gpuvm_bos serving as
> > +		 * external object
> > +		 */
> > +		struct list_head list;
> > +
> > +		/**
> > +		 * @lock: spinlock to protect the extobj list
> > +		 */
> > +		spinlock_t lock;
> > +	} extobj;
> > +
> > +	/**
> > +	 * @evict: structure holding the evict list and evict list lock
> > +	 */
> > +	struct {
> > +		/**
> > +		 * @list: &list_head storing &drm_gpuvm_bos currently being
> > +		 * evicted
> > +		 */
> > +		struct list_head list;
> > +
> > +		/**
> > +		 * @lock: spinlock to protect the evict list
> > +		 */
> > +		spinlock_t lock;
> > +	} evict;
> >   };
> >   void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> > @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> >   		    const struct drm_gpuvm_ops *ops);
> >   void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
> > +/**
> > + * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
> > + * external object
> > + * @gpuvm: the &drm_gpuvm to check
> > + * @obj: the &drm_gem_object to check
> > + *
> > + * Returns: true if the &drm_gem_object &dma_resv differs from the
> > + * &drm_gpuvms &dma_resv, false otherwise
> > + */
> > +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
> > +				       struct drm_gem_object *obj)
> > +{
> > +	return obj && obj->resv != gpuvm->resv;
> > +}
> > +
> >   static inline struct drm_gpuva *
> >   __drm_gpuva_next(struct drm_gpuva *va)
> >   {
> > @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
> >   #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
> >   	list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)
> > +/**
> > + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
> > + *
> > + * This structure should be created on the stack as &drm_exec should be.
> > + *
> > + * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
> > + */
> > +struct drm_gpuvm_exec {
> > +	/**
> > +	 * @exec: the &drm_exec structure
> > +	 */
> > +	struct drm_exec exec;
> > +
> > +	/**
> > +	 * @vm: the &drm_gpuvm to lock its DMA reservations
> > +	 */
> > +	struct drm_gpuvm *vm;
> > +
> > +	/**
> > +	 * @extra: Callback and corresponding private data for the driver to
> > +	 * lock arbitrary additional &drm_gem_objects.
> > +	 */
> > +	struct {
> > +		/**
> > +		 * @fn: The driver callback to lock additional &drm_gem_objects.
> > +		 */
> > +		int (*fn)(struct drm_gpuvm_exec *vm_exec,
> > +			  unsigned int num_fences);
> > +
> > +		/**
> > +		 * @priv: driver private data for the @fn callback
> > +		 */
> > +		void *priv;
> > +	} extra;
> > +};
> > +
> > +/**
> > + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
> > + * @gpuvm: the &drm_gpuvm
> > + * @exec: the &drm_exec context
> > + * @num_fences: the amount of &dma_fences to reserve
> > + *
> > + * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
> > + *
> > + * Using this function directly, it is the drivers responsibility to call
> > + * drm_exec_init() and drm_exec_fini() accordingly.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +static inline int
> > +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
> > +		     struct drm_exec *exec,
> > +		     unsigned int num_fences)
> > +{
> > +	return drm_exec_prepare_obj(exec, &gpuvm->d_obj, num_fences);
> > +}
> > +
> > +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > +			      struct drm_exec *exec,
> > +			      unsigned int num_fences);
> > +
> > +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
> > +			    struct drm_exec *exec,
> > +			    u64 addr, u64 range,
> > +			    unsigned int num_fences);
> > +
> > +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > +			unsigned int num_fences,
> > +			bool interruptible);
> > +
> > +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> > +			      struct drm_gem_object **objs,
> > +			      unsigned int num_objs,
> > +			      unsigned int num_fences,
> > +			      bool interruptible);
> > +
> > +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> > +			      u64 addr, u64 range,
> > +			      unsigned int num_fences,
> > +			      bool interruptible);
> > +
> > +/**
> > + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
> > + * @gpuvm: the &drm_gpuvm
> > + *
> > + * Releases all dma-resv locks of all &drm_gem_objects previously acquired
> > + * through drm_gpuvm_lock() or its variants.
> > + *
> > + * Returns: 0 on success, negative error code on failure.
> > + */
> > +static inline void
> > +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> > +{
> > +	drm_exec_fini(&vm_exec->exec);
> > +}
> > +
> > +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
> > +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > +			      struct drm_exec *exec,
> > +			      struct dma_fence *fence,
> > +			      enum dma_resv_usage private_usage,
> > +			      enum dma_resv_usage extobj_usage);
> > +
> > +/**
> > + * drm_gpuvm_exec_resv_add_fence()
> > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > + * @fence: fence to add
> > + * @private_usage: private dma-resv usage
> > + * @extobj_usage: extobj dma-resv usage
> > + *
> > + * See drm_gpuvm_resv_add_fence().
> > + */
> > +static inline void
> > +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
> > +			      struct dma_fence *fence,
> > +			      enum dma_resv_usage private_usage,
> > +			      enum dma_resv_usage extobj_usage)
> > +{
> > +	drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
> > +				 private_usage, extobj_usage);
> > +}
> > +
> >   /**
> >    * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
> >    * &drm_gem_object combination
> > @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
> >   			 * gpuva list.
> >   			 */
> >   			struct list_head gem;
> > +
> > +			/**
> > +			 * @evict: List entry to attach to the &drm_gpuvms
> > +			 * extobj list.
> > +			 */
> > +			struct list_head extobj;
> > +
> > +			/**
> > +			 * @evict: List entry to attach to the &drm_gpuvms evict
> > +			 * list.
> > +			 */
> > +			struct list_head evict;
> >   		} entry;
> >   	} list;
> >   };
> > @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
> >   drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> >   		  struct drm_gem_object *obj);
> > +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
> > +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
> > +
> >   /**
> >    * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
> >    * @va__: &drm_gpuva structure to assign to in each iteration step
> > @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
> >   	 * used.
> >   	 */
> >   	int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
> > +
> > +	/**
> > +	 * @bo_validate: called from drm_gpuvm_validate()
> > +	 *
> > +	 * Drivers receive this callback for every evicted &drm_gem_object being
> > +	 * mapped in the corresponding &drm_gpuvm.
> > +	 *
> > +	 * Typically, drivers would call their driver specific variant of
> > +	 * ttm_bo_validate() from within this callback.
> > +	 */
> > +	int (*bo_validate)(struct drm_gem_object *obj);
> >   };
> >   int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>
Thomas Hellstrom Sept. 12, 2023, 7:23 p.m. UTC | #9
On 9/12/23 18:50, Danilo Krummrich wrote:
> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>> Hi, Danilo,
>>
>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>> So far the DRM GPUVA manager offers common infrastructure to track GPU VA
>>> allocations and mappings, generically connect GPU VA mappings to their
>>> backing buffers and perform more complex mapping operations on the GPU VA
>>> space.
>>>
>>> However, there are more design patterns commonly used by drivers, which
>>> can potentially be generalized in order to make the DRM GPUVA manager
>>> represent a basic GPU-VM implementation. In this context, this patch aims
>>> at generalizing the following elements.
>>>
>>> 1) Provide a common dma-resv for GEM objects not being used outside of
>>>      this GPU-VM.
>>>
>>> 2) Provide tracking of external GEM objects (GEM objects which are
>>>      shared with other GPU-VMs).
>>>
>>> 3) Provide functions to efficiently lock all GEM objects dma-resv the
>>>      GPU-VM contains mappings of.
>>>
>>> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
>>>      of, such that validation of evicted GEM objects is accelerated.
>>>
>>> 5) Provide some convinience functions for common patterns.
>>>
>>> Rather than being designed as a "framework", the target is to make all
>>> features appear as a collection of optional helper functions, such that
>>> drivers are free to make use of the DRM GPUVA managers basic
>>> functionality and opt-in for other features without setting any feature
>>> flags, just by making use of the corresponding functions.
>>>
>>> Big kudos to Boris Brezillon for his help to figure out locking for drivers
>>> updating the GPU VA space within the fence signalling path.
>>>
>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>> ---
>>>    drivers/gpu/drm/drm_gpuvm.c | 516 ++++++++++++++++++++++++++++++++++++
>>>    include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>    2 files changed, 713 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
>>> index f4411047dbb3..8e62a043f719 100644
>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>> @@ -73,6 +73,21 @@
>>>     * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
>>>     * particular combination. If not existent a new instance is created and linked
>>>     * to the &drm_gem_object.
>>> + *
>>> + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
>>> + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
>>> + * list are maintained in order to accelerate locking of dma-resv locks and
>>> + * validation of evicted objects bound in a &drm_gpuvm. For instance the all
>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
>>> + * order to validate all evicted &drm_gem_objects. It is also possible to lock
>>> + * additional &drm_gem_objects by providing the corresponding parameters to
>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
>>> + * use of helper functions such as drm_gpuvm_prepare_range() or
>>> + * drm_gpuvm_prepare_objects().
>>> + *
>>> + * Every bound &drm_gem_object is treated as external object when its &dma_resv
>>> + * structure is different than the &drm_gpuvm's common &dma_resv structure.
>>>     */
>>>    /**
>>> @@ -420,6 +435,20 @@
>>>     * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
>>>     * &drm_gem_object must be able to observe previous creations and destructions
>>>     * of &drm_gpuvm_bos in order to keep instances unique.
>>> + *
>>> + * The &drm_gpuvm's lists for keeping track of external and evicted objects are
>>> + * protected against concurrent insertion / removal and iteration internally.
>>> + *
>>> + * However, drivers still need ensure to protect concurrent calls to functions
>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>> + * drm_gpuvm_prepare_objects(). Every such function contains a particular
>>> + * comment and lockdep checks if possible.
>>> + *
>>> + * Functions adding or removing entries from those lists, such as
>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with external
>>> + * locks being held, e.g. in order to avoid the corresponding list to be
>>> + * (safely) modified while potentially being iternated by other API functions.
>>> + * However, this is entirely optional.
>>>     */
>>>    /**
>>> @@ -632,6 +661,131 @@
>>>     *	}
>>>     */
>>> +/**
>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>> + * @__gpuvm: The GPU VM
>>> + * @__list_name: The name of the list we're iterating on
>>> + * @__local_list: A pointer to the local list used to store already iterated items
>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
>>> + *
>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
>>> + * iterator releases the lock immediately after picking the first element from
>>> + * the list, so list insertion deletion can happen concurrently.
>> Are the list spinlocks needed for that async state update from within the
>> dma-fence critical section we've discussed previously?
> Yes, but also for other reasons, see below.
>
>> Otherwise it should be sufficient to protect the lists with the gpuvm's resv
>> (or for the extobj list with an outer lock).
>>
>> If those spinlocks are still needed in some situations, perhaps could we
>> have an option to set them to NULL (Like IIRC the maple tree allows for)?
> The evict spinlock is needed in any case, since in drm_gpuvm_bo_evict() we're
> holding only the dma-resv lock from the BO this function gets called for. Hence,
> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with different BOs.
No. Only if you try to add external objects to the vm's evict list from 
within the evict code. That's not necessary since you loop through all 
external objects anyway when locking them so an "evicted" bool in the 
vm_bo, protected by the bo resv would be sufficient. The extobj locking 
loop can then add the bo to the evicted list.
>
> For extobjs an outer lock would be enough in case of Xe, but I really would not
> like to add even more complexity just to get the spinlock out of the way in case
> the driver already has an outer lock protecting this path.

I must disagree here. These spinlocks and atomic operations are pretty 
costly and as discussed earlier this type of locking was the reason (at 
least according to the commit message) that made Christian drop the 
XArray use in drm_exec for the same set of objects: "The locking 
overhead is unecessary and measurable". IMHO the spinlock is the added 
complexity and a single wide lock following the drm locking guidelines 
set out by Daniel and David should really be the default choice with an 
opt-in for a spinlock if needed for async and pushing out to a wq is not 
an option.

A pretty simple way that would not add much code would be

static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm, 
spinlock_t *lock)

{

     if (!gpuvm->resv_protected_lists)
         spin_lock(lock);

}

>> For such drivers, that would require anybody calling unlink to hold the vm's
>> resv, though.
> In V4 I want to go back to having a dedicated lock for the GEMs gpuva list (or
> VM_BO list to be more precise). We can't just use the dma-resv lock for that
> with VM_BO abstractions, because on destruction of a VM_BO we otherwise wouldn't
> be allowed to already hold the dma-resv lock. That's the fix I was referring to
> earlier.

Yeah, I can see the need for a dedicated lock for the GEM's gpuva list, 
but holding the vm's dma-resv lock across the unlink shouldn't be a 
problem. We may free the object and a pointer to the vm's resv during 
unlink but we don't free the vm's resv.  It'd be a matter of ensuring 
that any calls to unlink from *within* drm_gpuvm allows it to be held.

/Thomas


>> It seems that with that also the refcount could be make non-atomic.
>>
>> All in the spirit of the drm locking guidelines "use big locks when
>> possible".
>> Lower level locks only when necessary for performance or locking inversion?
>>
>> /Thomas
>>
>>
>>> + *
>>> + * Elements popped from the original list are kept in a local list, so removal
>>> + * and is_empty checks can still happen while we're iterating the list.
>>> + */
>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
>>> +	({										\
>>> +		struct drm_gpuvm_bo *__vm_bo;						\
>>> +											\
>>> +		drm_gpuvm_bo_put(__prev_vm_bo);						\
>>> +											\
>>> +		spin_lock(&(__gpuvm)->__list_name.lock);				\
>>> +		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
>>> +			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
>>> +						   struct drm_gpuvm_bo,			\
>>> +						   list.entry.__list_name);		\
>>> +			if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
>>> +				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
>>> +					       __local_list);				\
>>> +				break;							\
>>> +			} else {							\
>>> +				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
>>> +				__vm_bo = NULL;						\
>>> +			}								\
>>> +		}									\
>>> +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
>>> +											\
>>> +		__vm_bo;								\
>>> +	})
>>> +
>>> +/**
>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>> + *
>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
>>> + * iterator releases the lock immediately after picking the first element from the
>>> + * list, so list insertion and deletion can happen concurrently.
>>> + *
>>> + * Typical use:
>>> + *
>>> + *	struct drm_gpuvm_bo *vm_bo;
>>> + *	LIST_HEAD(my_local_list);
>>> + *
>>> + *	ret = 0;
>>> + *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
>>> + *		ret = do_something_with_vm_bo(..., vm_bo);
>>> + *		if (ret)
>>> + *			break;
>>> + *	}
>>> + *	drm_gpuvm_bo_put(vm_bo);
>>> + *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
>>> + *
>>> + *
>>> + * Only used for internal list iterations, not meant to be exposed to the outside
>>> + * world.
>>> + */
>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
>>> +	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
>>> +						__local_list, NULL);		\
>>> +	     __vm_bo;								\
>>> +	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
>>> +						__local_list, __vm_bo))		\
>>> +
>>> +/**
>>> + * restore_vm_bo_list() - move vm_bo elements back to their original list
>>> + * @__gpuvm: The GPU VM
>>> + * @__list_name: The name of the list we're iterating on
>>> + * @__local_list: A pointer to the local list used to store already iterated items
>>> + *
>>> + * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
>>> + * to restore the original state and let new iterations take place.
>>> + */
>>> +#define restore_vm_bo_list(__gpuvm, __list_name, __local_list)				\
>>> +	do {										\
>>> +		/* Merge back the two lists, moving local list elements to the		\
>>> +		 * head to preserve previous ordering, in case it matters.		\
>>> +		 */									\
>>> +		spin_lock(&(__gpuvm)->__list_name.lock);				\
>>> +		list_splice(__local_list, &(__gpuvm)->__list_name.list);		\
>>> +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
>>> +	} while (0)
>>> +/**
>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
>>> + * @__vm_bo: the &drm_gpuvm_bo
>>> + * @__list_name: the name of the list to insert into
>>> + *
>>> + * Inserts the given @__vm_bo into the list specified by @__list_name and
>>> + * increases the vm_bo's reference count.
>>> + */
>>> +#define drm_gpuvm_bo_list_add(__vm_bo, __list_name)				\
>>> +	do {									\
>>> +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
>>> +		if (list_empty(&(__vm_bo)->list.entry.__list_name))		\
>>> +			list_add_tail(&(__vm_bo)->list.entry.__list_name,	\
>>> +				      &(__vm_bo)->vm->__list_name.list);	\
>>> +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
>>> +	} while (0)
>>> +
>>> +/**
>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
>>> + * @__vm_bo: the &drm_gpuvm_bo
>>> + * @__list_name: the name of the list to insert into
>>> + *
>>> + * Removes the given @__vm_bo from the list specified by @__list_name and
>>> + * decreases the vm_bo's reference count.
>>> + */
>>> +#define drm_gpuvm_bo_list_del(__vm_bo, __list_name)				\
>>> +	do {									\
>>> +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
>>> +		if (!list_empty(&(__vm_bo)->list.entry.__list_name))		\
>>> +			list_del_init(&(__vm_bo)->list.entry.__list_name);	\
>>> +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
>>> +	} while (0)
>>> +
>>> +static int __must_check
>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>> +
>>>    #define to_drm_gpuva(__node)	container_of((__node), struct drm_gpuva, rb.node)
>>>    #define GPUVA_START(node) ((node)->va.addr)
>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
>>>    	gpuvm->rb.tree = RB_ROOT_CACHED;
>>>    	INIT_LIST_HEAD(&gpuvm->rb.list);
>>> +	INIT_LIST_HEAD(&gpuvm->extobj.list);
>>> +	spin_lock_init(&gpuvm->extobj.lock);
>>> +
>>> +	INIT_LIST_HEAD(&gpuvm->evict.list);
>>> +	spin_lock_init(&gpuvm->evict.lock);
>>> +
>>>    	drm_gpuva_check_overflow(start_offset, range);
>>>    	gpuvm->mm_start = start_offset;
>>>    	gpuvm->mm_range = range;
>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
>>>    	WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>    	     "GPUVA tree is not empty, potentially leaking memory.\n");
>>> +	WARN(!list_empty(&gpuvm->extobj.list), "Extobj list should be empty.\n");
>>> +	WARN(!list_empty(&gpuvm->evict.list), "Evict list should be empty.\n");
>>> +
>>>    	drm_gem_private_object_fini(&gpuvm->d_obj);
>>>    }
>>>    EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>> +/**
>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>> + * @gpuvm: the &drm_gpuvm
>>> + * @exec: the &drm_exec locking context
>>> + * @num_fences: the amount of &dma_fences to reserve
>>> + *
>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
>>> + * &drm_gpuvm contains mappings of.
>>> + *
>>> + * Using this function directly, it is the drivers responsibility to call
>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>> + *
>>> + * Note: This function is safe against concurrent insertion and removal of
>>> + * external objects, however it is not safe against concurrent usage itself.
>>> + *
>>> + * Drivers need to make sure to protect this case with either an outer VM lock
>>> + * or by calling drm_gpuvm_prepare_vm() before this function within the
>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
>>> + * mutual exclusion.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +int
>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>> +			  struct drm_exec *exec,
>>> +			  unsigned int num_fences)
>>> +{
>>> +	struct drm_gpuvm_bo *vm_bo;
>>> +	LIST_HEAD(extobjs);
>>> +	int ret = 0;
>>> +
>>> +	for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
>>> +		ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
>>> +		if (ret)
>>> +			break;
>>> +	}
>>> +	/* Drop ref in case we break out of the loop. */
>>> +	drm_gpuvm_bo_put(vm_bo);
>>> +	restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>> +
>>> +/**
>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
>>> + * @gpuvm: the &drm_gpuvm
>>> + * @exec: the &drm_exec locking context
>>> + * @addr: the start address within the VA space
>>> + * @range: the range to iterate within the VA space
>>> + * @num_fences: the amount of &dma_fences to reserve
>>> + *
>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
>>> + * and @addr + @range.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +int
>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
>>> +			u64 addr, u64 range, unsigned int num_fences)
>>> +{
>>> +	struct drm_gpuva *va;
>>> +	u64 end = addr + range;
>>> +	int ret;
>>> +
>>> +	drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>> +		struct drm_gem_object *obj = va->gem.obj;
>>> +
>>> +		ret = drm_exec_prepare_obj(exec, obj, num_fences);
>>> +		if (ret)
>>> +			return ret;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>> +
>>> +/**
>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>> + * @num_fences: the amount of &dma_fences to reserve
>>> + * @interruptible: sleep interruptible if waiting
>>> + *
>>> + * Acquires all dma-resv locks of all &drm_gem_objects the given
>>> + * &drm_gpuvm contains mappings of.
>>> + *
>>> + * Addionally, when calling this function with struct drm_gpuvm_exec::extra
>>> + * being set the driver receives the given @fn callback to lock additional
>>> + * dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
>>> + * would call drm_exec_prepare_obj() from within this callback.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +int
>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>> +		    unsigned int num_fences,
>>> +		    bool interruptible)
>>> +{
>>> +	struct drm_gpuvm *gpuvm = vm_exec->vm;
>>> +	struct drm_exec *exec = &vm_exec->exec;
>>> +	uint32_t flags;
>>> +	int ret;
>>> +
>>> +	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
>>> +		DRM_EXEC_IGNORE_DUPLICATES;
>>> +
>>> +	drm_exec_init(exec, flags);
>>> +
>>> +	drm_exec_until_all_locked(exec) {
>>> +		ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
>>> +		drm_exec_retry_on_contention(exec);
>>> +		if (ret)
>>> +			goto err;
>>> +
>>> +		ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
>>> +		drm_exec_retry_on_contention(exec);
>>> +		if (ret)
>>> +			goto err;
>>> +
>>> +		if (vm_exec->extra.fn) {
>>> +			ret = vm_exec->extra.fn(vm_exec, num_fences);
>>> +			drm_exec_retry_on_contention(exec);
>>> +			if (ret)
>>> +				goto err;
>>> +		}
>>> +	}
>>> +
>>> +	return 0;
>>> +
>>> +err:
>>> +	drm_exec_fini(exec);
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>> +
>>> +static int
>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
>>> +{
>>> +	struct {
>>> +		struct drm_gem_object **objs;
>>> +		unsigned int num_objs;
>>> +	} *args = vm_exec->extra.priv;
>>> +
>>> +	return drm_exec_prepare_array(&vm_exec->exec, args->objs,
>>> +				      args->num_objs, num_fences);
>>> +}
>>> +
>>> +/**
>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>> + * @objs: additional &drm_gem_objects to lock
>>> + * @num_objs: the number of additional &drm_gem_objects to lock
>>> + * @num_fences: the amount of &dma_fences to reserve
>>> + * @interruptible: sleep interruptible if waiting
>>> + *
>>> + * Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
>>> + * contains mappings of, plus the ones given through @objs.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +int
>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>> +			  struct drm_gem_object **objs,
>>> +			  unsigned int num_objs,
>>> +			  unsigned int num_fences,
>>> +			  bool interruptible)
>>> +{
>>> +	struct {
>>> +		struct drm_gem_object **objs;
>>> +		unsigned int num_objs;
>>> +	} args;
>>> +
>>> +	args.objs = objs;
>>> +	args.num_objs = num_objs;
>>> +
>>> +	vm_exec->extra.fn = fn_lock_array;
>>> +	vm_exec->extra.priv = &args;
>>> +
>>> +	return drm_gpuvm_exec_lock(vm_exec, num_fences, interruptible);
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>> +
>>> +/**
>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>> + * @addr: the start address within the VA space
>>> + * @range: the range to iterate within the VA space
>>> + * @num_fences: the amount of &dma_fences to reserve
>>> + * @interruptible: sleep interruptible if waiting
>>> + *
>>> + * Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
>>> + * @addr + @range.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +int
>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>> +			  u64 addr, u64 range,
>>> +			  unsigned int num_fences,
>>> +			  bool interruptible)
>>> +{
>>> +	struct drm_gpuvm *gpuvm = vm_exec->vm;
>>> +	struct drm_exec *exec = &vm_exec->exec;
>>> +	uint32_t flags;
>>> +	int ret;
>>> +
>>> +	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
>>> +		DRM_EXEC_IGNORE_DUPLICATES;
>>> +
>>> +	drm_exec_init(exec, flags);
>>> +
>>> +	drm_exec_until_all_locked(exec) {
>>> +		ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
>>> +					      num_fences);
>>> +		drm_exec_retry_on_contention(exec);
>>> +		if (ret)
>>> +			goto err;
>>> +	}
>>> +
>>> +	return ret;
>>> +
>>> +err:
>>> +	drm_exec_fini(exec);
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>> +
>>> +/**
>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>> + *
>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all evicted buffer
>>> + * objects being mapped in the given &drm_gpuvm.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +int
>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>> +{
>>> +	const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>> +	struct drm_gpuvm_bo *vm_bo;
>>> +	LIST_HEAD(evict);
>>> +	int ret = 0;
>>> +
>>> +	if (unlikely(!ops || !ops->bo_validate))
>>> +		return -ENOTSUPP;
>>> +
>>> +	for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>> +		dma_resv_assert_held(vm_bo->obj->resv);
>>> +		ret = ops->bo_validate(vm_bo->obj);
>>> +		if (ret)
>>> +			break;
>>> +	}
>>> +	/* Drop ref in case we break out of the loop. */
>>> +	drm_gpuvm_bo_put(vm_bo);
>>> +	restore_vm_bo_list(gpuvm, evict, &evict);
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>> +
>>> +/**
>>> + * drm_gpuvm_resv_add_fence - add fence to private and all extobj
>>> + * dma-resv
>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>> + * @exec: the &drm_exec locking context
>>> + * @fence: fence to add
>>> + * @private_usage: private dma-resv usage
>>> + * @extobj_usage: extobj dma-resv usage
>>> + */
>>> +void
>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>> +			 struct drm_exec *exec,
>>> +			 struct dma_fence *fence,
>>> +			 enum dma_resv_usage private_usage,
>>> +			 enum dma_resv_usage extobj_usage)
>>> +{
>>> +	struct drm_gem_object *obj;
>>> +	unsigned long index;
>>> +
>>> +	drm_exec_for_each_locked_object(exec, index, obj) {
>>> +		dma_resv_assert_held(obj->resv);
>>> +		dma_resv_add_fence(obj->resv, fence,
>>> +				   drm_gpuvm_is_extobj(gpuvm, obj) ?
>>> +				   private_usage : extobj_usage);
>>> +	}
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>> +
>>>    /**
>>>     * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
>>>     * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
>>>    	INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>    	INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>> +	INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>> +	INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>> +
>>>    	drm_gem_object_get(obj);
>>>    	return vm_bo;
>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>    	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>> +	spin_lock(&gpuvm->extobj.lock);
>>> +	list_del(&vm_bo->list.entry.extobj);
>>> +	spin_unlock(&gpuvm->extobj.lock);
>>> +
>>> +	spin_lock(&gpuvm->evict.lock);
>>> +	list_del(&vm_bo->list.entry.evict);
>>> +	spin_unlock(&gpuvm->evict.lock);
>>> +
>>>    	list_del(&vm_bo->list.entry.gem);
>>>    	drm_gem_object_put(obj);
>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>     * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>     *
>>>     * This releases a reference to @vm_bo.
>>> + *
>>> + * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
>>> + * includes removing it from the GEMs gpuva list. Hence, if a call to this
>>> + * function can potentially let the reference count to zero the caller must
>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>     */
>>>    void
>>>    drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>    }
>>>    EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>> +static int __must_check
>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>> +{
>>> +	return kref_get_unless_zero(&vm_bo->kref);
>>> +}
>>> +
>>>    static struct drm_gpuvm_bo *
>>>    __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>    		    struct drm_gem_object *obj)
>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
>>>    }
>>>    EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>> +/**
>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
>>> + * extobj list
>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
>>> + *
>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
>>> + * already and if the corresponding &drm_gem_object is an external object,
>>> + * actually.
>>> + */
>>> +void
>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>> +{
>>> +	struct drm_gpuvm *gpuvm = vm_bo->vm;
>>> +
>>> +	if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>> +		drm_gpuvm_bo_list_add(vm_bo, extobj);
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>> +
>>> +/**
>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a
>>> + * &drm_gpuvms evicted list
>>> + * @obj: the &drm_gem_object to add or remove
>>> + * @evict: indicates whether the object is evicted
>>> + *
>>> + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms evicted
>>> + * list containing a mapping of this &drm_gem_object.
>>> + */
>>> +void
>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>> +{
>>> +	struct drm_gpuvm_bo *vm_bo;
>>> +
>>> +	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>> +		if (evict)
>>> +			drm_gpuvm_bo_list_add(vm_bo, evict);
>>> +		else
>>> +			drm_gpuvm_bo_list_del(vm_bo, evict);
>>> +	}
>>> +}
>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>> +
>>>    static int
>>>    __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>    		   struct drm_gpuva *va)
>>> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
>>> index afa50b9059a2..834bb6d6617e 100644
>>> --- a/include/drm/drm_gpuvm.h
>>> +++ b/include/drm/drm_gpuvm.h
>>> @@ -26,10 +26,12 @@
>>>     */
>>>    #include <linux/list.h>
>>> +#include <linux/dma-resv.h>
>>>    #include <linux/rbtree.h>
>>>    #include <linux/types.h>
>>>    #include <drm/drm_gem.h>
>>> +#include <drm/drm_exec.h>
>>>    struct drm_gpuvm;
>>>    struct drm_gpuvm_bo;
>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>    	 * space
>>>    	 */
>>>    	struct dma_resv *resv;
>>> +
>>> +	/**
>>> +	 * @extobj: structure holding the extobj list
>>> +	 */
>>> +	struct {
>>> +		/**
>>> +		 * @list: &list_head storing &drm_gpuvm_bos serving as
>>> +		 * external object
>>> +		 */
>>> +		struct list_head list;
>>> +
>>> +		/**
>>> +		 * @lock: spinlock to protect the extobj list
>>> +		 */
>>> +		spinlock_t lock;
>>> +	} extobj;
>>> +
>>> +	/**
>>> +	 * @evict: structure holding the evict list and evict list lock
>>> +	 */
>>> +	struct {
>>> +		/**
>>> +		 * @list: &list_head storing &drm_gpuvm_bos currently being
>>> +		 * evicted
>>> +		 */
>>> +		struct list_head list;
>>> +
>>> +		/**
>>> +		 * @lock: spinlock to protect the evict list
>>> +		 */
>>> +		spinlock_t lock;
>>> +	} evict;
>>>    };
>>>    void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
>>>    		    const struct drm_gpuvm_ops *ops);
>>>    void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>> +/**
>>> + * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
>>> + * external object
>>> + * @gpuvm: the &drm_gpuvm to check
>>> + * @obj: the &drm_gem_object to check
>>> + *
>>> + * Returns: true if the &drm_gem_object &dma_resv differs from the
>>> + * &drm_gpuvms &dma_resv, false otherwise
>>> + */
>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
>>> +				       struct drm_gem_object *obj)
>>> +{
>>> +	return obj && obj->resv != gpuvm->resv;
>>> +}
>>> +
>>>    static inline struct drm_gpuva *
>>>    __drm_gpuva_next(struct drm_gpuva *va)
>>>    {
>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>    #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
>>>    	list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)
>>> +/**
>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
>>> + *
>>> + * This structure should be created on the stack as &drm_exec should be.
>>> + *
>>> + * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
>>> + */
>>> +struct drm_gpuvm_exec {
>>> +	/**
>>> +	 * @exec: the &drm_exec structure
>>> +	 */
>>> +	struct drm_exec exec;
>>> +
>>> +	/**
>>> +	 * @vm: the &drm_gpuvm to lock its DMA reservations
>>> +	 */
>>> +	struct drm_gpuvm *vm;
>>> +
>>> +	/**
>>> +	 * @extra: Callback and corresponding private data for the driver to
>>> +	 * lock arbitrary additional &drm_gem_objects.
>>> +	 */
>>> +	struct {
>>> +		/**
>>> +		 * @fn: The driver callback to lock additional &drm_gem_objects.
>>> +		 */
>>> +		int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>> +			  unsigned int num_fences);
>>> +
>>> +		/**
>>> +		 * @priv: driver private data for the @fn callback
>>> +		 */
>>> +		void *priv;
>>> +	} extra;
>>> +};
>>> +
>>> +/**
>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
>>> + * @gpuvm: the &drm_gpuvm
>>> + * @exec: the &drm_exec context
>>> + * @num_fences: the amount of &dma_fences to reserve
>>> + *
>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
>>> + *
>>> + * Using this function directly, it is the drivers responsibility to call
>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +static inline int
>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>> +		     struct drm_exec *exec,
>>> +		     unsigned int num_fences)
>>> +{
>>> +	return drm_exec_prepare_obj(exec, &gpuvm->d_obj, num_fences);
>>> +}
>>> +
>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>> +			      struct drm_exec *exec,
>>> +			      unsigned int num_fences);
>>> +
>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>> +			    struct drm_exec *exec,
>>> +			    u64 addr, u64 range,
>>> +			    unsigned int num_fences);
>>> +
>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>> +			unsigned int num_fences,
>>> +			bool interruptible);
>>> +
>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>> +			      struct drm_gem_object **objs,
>>> +			      unsigned int num_objs,
>>> +			      unsigned int num_fences,
>>> +			      bool interruptible);
>>> +
>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>> +			      u64 addr, u64 range,
>>> +			      unsigned int num_fences,
>>> +			      bool interruptible);
>>> +
>>> +/**
>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
>>> + * @gpuvm: the &drm_gpuvm
>>> + *
>>> + * Releases all dma-resv locks of all &drm_gem_objects previously acquired
>>> + * through drm_gpuvm_lock() or its variants.
>>> + *
>>> + * Returns: 0 on success, negative error code on failure.
>>> + */
>>> +static inline void
>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>> +{
>>> +	drm_exec_fini(&vm_exec->exec);
>>> +}
>>> +
>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>> +			      struct drm_exec *exec,
>>> +			      struct dma_fence *fence,
>>> +			      enum dma_resv_usage private_usage,
>>> +			      enum dma_resv_usage extobj_usage);
>>> +
>>> +/**
>>> + * drm_gpuvm_exec_resv_add_fence()
>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>> + * @fence: fence to add
>>> + * @private_usage: private dma-resv usage
>>> + * @extobj_usage: extobj dma-resv usage
>>> + *
>>> + * See drm_gpuvm_resv_add_fence().
>>> + */
>>> +static inline void
>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
>>> +			      struct dma_fence *fence,
>>> +			      enum dma_resv_usage private_usage,
>>> +			      enum dma_resv_usage extobj_usage)
>>> +{
>>> +	drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
>>> +				 private_usage, extobj_usage);
>>> +}
>>> +
>>>    /**
>>>     * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
>>>     * &drm_gem_object combination
>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>    			 * gpuva list.
>>>    			 */
>>>    			struct list_head gem;
>>> +
>>> +			/**
>>> +			 * @evict: List entry to attach to the &drm_gpuvms
>>> +			 * extobj list.
>>> +			 */
>>> +			struct list_head extobj;
>>> +
>>> +			/**
>>> +			 * @evict: List entry to attach to the &drm_gpuvms evict
>>> +			 * list.
>>> +			 */
>>> +			struct list_head evict;
>>>    		} entry;
>>>    	} list;
>>>    };
>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>    drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>    		  struct drm_gem_object *obj);
>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>> +
>>>    /**
>>>     * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
>>>     * @va__: &drm_gpuva structure to assign to in each iteration step
>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>    	 * used.
>>>    	 */
>>>    	int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
>>> +
>>> +	/**
>>> +	 * @bo_validate: called from drm_gpuvm_validate()
>>> +	 *
>>> +	 * Drivers receive this callback for every evicted &drm_gem_object being
>>> +	 * mapped in the corresponding &drm_gpuvm.
>>> +	 *
>>> +	 * Typically, drivers would call their driver specific variant of
>>> +	 * ttm_bo_validate() from within this callback.
>>> +	 */
>>> +	int (*bo_validate)(struct drm_gem_object *obj);
>>>    };
>>>    int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
Danilo Krummrich Sept. 12, 2023, 11:36 p.m. UTC | #10
On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
> 
> On 9/12/23 18:50, Danilo Krummrich wrote:
> > On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
> > > Hi, Danilo,
> > > 
> > > On 9/9/23 17:31, Danilo Krummrich wrote:
> > > > So far the DRM GPUVA manager offers common infrastructure to track GPU VA
> > > > allocations and mappings, generically connect GPU VA mappings to their
> > > > backing buffers and perform more complex mapping operations on the GPU VA
> > > > space.
> > > > 
> > > > However, there are more design patterns commonly used by drivers, which
> > > > can potentially be generalized in order to make the DRM GPUVA manager
> > > > represent a basic GPU-VM implementation. In this context, this patch aims
> > > > at generalizing the following elements.
> > > > 
> > > > 1) Provide a common dma-resv for GEM objects not being used outside of
> > > >      this GPU-VM.
> > > > 
> > > > 2) Provide tracking of external GEM objects (GEM objects which are
> > > >      shared with other GPU-VMs).
> > > > 
> > > > 3) Provide functions to efficiently lock all GEM objects dma-resv the
> > > >      GPU-VM contains mappings of.
> > > > 
> > > > 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
> > > >      of, such that validation of evicted GEM objects is accelerated.
> > > > 
> > > > 5) Provide some convinience functions for common patterns.
> > > > 
> > > > Rather than being designed as a "framework", the target is to make all
> > > > features appear as a collection of optional helper functions, such that
> > > > drivers are free to make use of the DRM GPUVA managers basic
> > > > functionality and opt-in for other features without setting any feature
> > > > flags, just by making use of the corresponding functions.
> > > > 
> > > > Big kudos to Boris Brezillon for his help to figure out locking for drivers
> > > > updating the GPU VA space within the fence signalling path.
> > > > 
> > > > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > > ---
> > > >    drivers/gpu/drm/drm_gpuvm.c | 516 ++++++++++++++++++++++++++++++++++++
> > > >    include/drm/drm_gpuvm.h     | 197 ++++++++++++++
> > > >    2 files changed, 713 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > > > index f4411047dbb3..8e62a043f719 100644
> > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > @@ -73,6 +73,21 @@
> > > >     * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
> > > >     * particular combination. If not existent a new instance is created and linked
> > > >     * to the &drm_gem_object.
> > > > + *
> > > > + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
> > > > + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
> > > > + * list are maintained in order to accelerate locking of dma-resv locks and
> > > > + * validation of evicted objects bound in a &drm_gpuvm. For instance the all
> > > > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
> > > > + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
> > > > + * order to validate all evicted &drm_gem_objects. It is also possible to lock
> > > > + * additional &drm_gem_objects by providing the corresponding parameters to
> > > > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
> > > > + * use of helper functions such as drm_gpuvm_prepare_range() or
> > > > + * drm_gpuvm_prepare_objects().
> > > > + *
> > > > + * Every bound &drm_gem_object is treated as external object when its &dma_resv
> > > > + * structure is different than the &drm_gpuvm's common &dma_resv structure.
> > > >     */
> > > >    /**
> > > > @@ -420,6 +435,20 @@
> > > >     * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
> > > >     * &drm_gem_object must be able to observe previous creations and destructions
> > > >     * of &drm_gpuvm_bos in order to keep instances unique.
> > > > + *
> > > > + * The &drm_gpuvm's lists for keeping track of external and evicted objects are
> > > > + * protected against concurrent insertion / removal and iteration internally.
> > > > + *
> > > > + * However, drivers still need ensure to protect concurrent calls to functions
> > > > + * iterating those lists, such as drm_gpuvm_validate() and
> > > > + * drm_gpuvm_prepare_objects(). Every such function contains a particular
> > > > + * comment and lockdep checks if possible.
> > > > + *
> > > > + * Functions adding or removing entries from those lists, such as
> > > > + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with external
> > > > + * locks being held, e.g. in order to avoid the corresponding list to be
> > > > + * (safely) modified while potentially being iternated by other API functions.
> > > > + * However, this is entirely optional.
> > > >     */
> > > >    /**
> > > > @@ -632,6 +661,131 @@
> > > >     *	}
> > > >     */
> > > > +/**
> > > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > > + * @__gpuvm: The GPU VM
> > > > + * @__list_name: The name of the list we're iterating on
> > > > + * @__local_list: A pointer to the local list used to store already iterated items
> > > > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> > > > + *
> > > > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > > > + * iterator releases the lock immediately after picking the first element from
> > > > + * the list, so list insertion deletion can happen concurrently.
> > > Are the list spinlocks needed for that async state update from within the
> > > dma-fence critical section we've discussed previously?
> > Yes, but also for other reasons, see below.
> > 
> > > Otherwise it should be sufficient to protect the lists with the gpuvm's resv
> > > (or for the extobj list with an outer lock).
> > > 
> > > If those spinlocks are still needed in some situations, perhaps could we
> > > have an option to set them to NULL (Like IIRC the maple tree allows for)?
> > The evict spinlock is needed in any case, since in drm_gpuvm_bo_evict() we're
> > holding only the dma-resv lock from the BO this function gets called for. Hence,
> > the spinlock protects concurrent drm_gpuvm_bo_evict() calls with different BOs.
> No. Only if you try to add external objects to the vm's evict list from
> within the evict code. That's not necessary since you loop through all
> external objects anyway when locking them so an "evicted" bool in the vm_bo,
> protected by the bo resv would be sufficient. The extobj locking loop can
> then add the bo to the evicted list.

And validate() can remove it while still holding all dma-resv locks, neat!
However, what if two tasks are trying to lock the VA space concurrently? What
do we do when the drm_gpuvm_bo's refcount drops to zero in drm_gpuva_unlink()?
Are we guaranteed that at this point of time the drm_gpuvm_bo is not on the
evicted list? Because otherwise we would call drm_gpuvm_bo_destroy() with the
dma-resv lock held, which wouldn't be allowed, since drm_gpuvm_bo_destroy()
might drop the last reference to the drm_gem_object and hence we'd potentially
free the dma-resv lock while holding it, at least if it's an external object.

> > 
> > For extobjs an outer lock would be enough in case of Xe, but I really would not
> > like to add even more complexity just to get the spinlock out of the way in case
> > the driver already has an outer lock protecting this path.
> 
> I must disagree here. These spinlocks and atomic operations are pretty
> costly and as discussed earlier this type of locking was the reason (at
> least according to the commit message) that made Christian drop the XArray
> use in drm_exec for the same set of objects: "The locking overhead is
> unecessary and measurable". IMHO the spinlock is the added complexity and a
> single wide lock following the drm locking guidelines set out by Daniel and
> David should really be the default choice with an opt-in for a spinlock if
> needed for async and pushing out to a wq is not an option.

For the external object list an outer lock would work as long as it's not the
dma-resv lock of the corresponding GEM object, since here we actually need to
remove the list entry from the external object list on drm_gpuvm_bo_destroy().
It's just a bit weird design wise that drivers would need to take this outer
lock on:

- drm_gpuvm_bo_extobj_add()
- drm_gpuvm_bo_destroy()	(and hence also drm_gpuvm_bo_put())
- drm_gpuva_unlink() 		(because it needs to call drm_gpuvm_bo_put())
- drm_gpuvm_exec_lock()
- drm_gpuvm_exec_lock_array()
- drm_gpuvm_prepare_range()

Given that it seems reasonable to do all the required locking internally.

In order to at least place lockdep checks, the driver would need to supply the
corresponding lock's lockdep_map, because the GPUVM otherwise doesn't know about
the lock.

Out of curiosity, what is the overhead of a spin_lock() that doesn't need to
spin? 

> 
> A pretty simple way that would not add much code would be
> 
> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm, spinlock_t
> *lock)
> 
> {
> 
>     if (!gpuvm->resv_protected_lists)
>         spin_lock(lock);
> 
> }
> 
> > > For such drivers, that would require anybody calling unlink to hold the vm's
> > > resv, though.
> > In V4 I want to go back to having a dedicated lock for the GEMs gpuva list (or
> > VM_BO list to be more precise). We can't just use the dma-resv lock for that
> > with VM_BO abstractions, because on destruction of a VM_BO we otherwise wouldn't
> > be allowed to already hold the dma-resv lock. That's the fix I was referring to
> > earlier.
> 
> Yeah, I can see the need for a dedicated lock for the GEM's gpuva list, but
> holding the vm's dma-resv lock across the unlink shouldn't be a problem. We
> may free the object and a pointer to the vm's resv during unlink but we
> don't free the vm's resv.  It'd be a matter of ensuring that any calls to
> unlink from *within* drm_gpuvm allows it to be held.

Drivers calling unlink() from the fence signaling path can't use the VM's
dma-resv lock.

Also, what if the object is an external object? We can't use the VM's dma-resv
lock here. And we can't have the GEM objs dma-resv lock held when calling
unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the refcount drops
to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might drop the
last reference of the GEM object. All those problems go away with a dedicated
GEM gpuva list lock.

> 
> /Thomas
> 
> 
> > > It seems that with that also the refcount could be make non-atomic.
> > > 
> > > All in the spirit of the drm locking guidelines "use big locks when
> > > possible".
> > > Lower level locks only when necessary for performance or locking inversion?
> > > 
> > > /Thomas
> > > 
> > > 
> > > > + *
> > > > + * Elements popped from the original list are kept in a local list, so removal
> > > > + * and is_empty checks can still happen while we're iterating the list.
> > > > + */
> > > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
> > > > +	({										\
> > > > +		struct drm_gpuvm_bo *__vm_bo;						\
> > > > +											\
> > > > +		drm_gpuvm_bo_put(__prev_vm_bo);						\
> > > > +											\
> > > > +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> > > > +		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
> > > > +			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
> > > > +						   struct drm_gpuvm_bo,			\
> > > > +						   list.entry.__list_name);		\
> > > > +			if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
> > > > +				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
> > > > +					       __local_list);				\
> > > > +				break;							\
> > > > +			} else {							\
> > > > +				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> > > > +				__vm_bo = NULL;						\
> > > > +			}								\
> > > > +		}									\
> > > > +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> > > > +											\
> > > > +		__vm_bo;								\
> > > > +	})
> > > > +
> > > > +/**
> > > > + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> > > > + *
> > > > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > > > + * iterator releases the lock immediately after picking the first element from the
> > > > + * list, so list insertion and deletion can happen concurrently.
> > > > + *
> > > > + * Typical use:
> > > > + *
> > > > + *	struct drm_gpuvm_bo *vm_bo;
> > > > + *	LIST_HEAD(my_local_list);
> > > > + *
> > > > + *	ret = 0;
> > > > + *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
> > > > + *		ret = do_something_with_vm_bo(..., vm_bo);
> > > > + *		if (ret)
> > > > + *			break;
> > > > + *	}
> > > > + *	drm_gpuvm_bo_put(vm_bo);
> > > > + *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
> > > > + *
> > > > + *
> > > > + * Only used for internal list iterations, not meant to be exposed to the outside
> > > > + * world.
> > > > + */
> > > > +#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
> > > > +	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> > > > +						__local_list, NULL);		\
> > > > +	     __vm_bo;								\
> > > > +	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
> > > > +						__local_list, __vm_bo))		\
> > > > +
> > > > +/**
> > > > + * restore_vm_bo_list() - move vm_bo elements back to their original list
> > > > + * @__gpuvm: The GPU VM
> > > > + * @__list_name: The name of the list we're iterating on
> > > > + * @__local_list: A pointer to the local list used to store already iterated items
> > > > + *
> > > > + * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
> > > > + * to restore the original state and let new iterations take place.
> > > > + */
> > > > +#define restore_vm_bo_list(__gpuvm, __list_name, __local_list)				\
> > > > +	do {										\
> > > > +		/* Merge back the two lists, moving local list elements to the		\
> > > > +		 * head to preserve previous ordering, in case it matters.		\
> > > > +		 */									\
> > > > +		spin_lock(&(__gpuvm)->__list_name.lock);				\
> > > > +		list_splice(__local_list, &(__gpuvm)->__list_name.list);		\
> > > > +		spin_unlock(&(__gpuvm)->__list_name.lock);				\
> > > > +	} while (0)
> > > > +/**
> > > > + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
> > > > + * @__vm_bo: the &drm_gpuvm_bo
> > > > + * @__list_name: the name of the list to insert into
> > > > + *
> > > > + * Inserts the given @__vm_bo into the list specified by @__list_name and
> > > > + * increases the vm_bo's reference count.
> > > > + */
> > > > +#define drm_gpuvm_bo_list_add(__vm_bo, __list_name)				\
> > > > +	do {									\
> > > > +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> > > > +		if (list_empty(&(__vm_bo)->list.entry.__list_name))		\
> > > > +			list_add_tail(&(__vm_bo)->list.entry.__list_name,	\
> > > > +				      &(__vm_bo)->vm->__list_name.list);	\
> > > > +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> > > > +	} while (0)
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
> > > > + * @__vm_bo: the &drm_gpuvm_bo
> > > > + * @__list_name: the name of the list to insert into
> > > > + *
> > > > + * Removes the given @__vm_bo from the list specified by @__list_name and
> > > > + * decreases the vm_bo's reference count.
> > > > + */
> > > > +#define drm_gpuvm_bo_list_del(__vm_bo, __list_name)				\
> > > > +	do {									\
> > > > +		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
> > > > +		if (!list_empty(&(__vm_bo)->list.entry.__list_name))		\
> > > > +			list_del_init(&(__vm_bo)->list.entry.__list_name);	\
> > > > +		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
> > > > +	} while (0)
> > > > +
> > > > +static int __must_check
> > > > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
> > > > +
> > > >    #define to_drm_gpuva(__node)	container_of((__node), struct drm_gpuva, rb.node)
> > > >    #define GPUVA_START(node) ((node)->va.addr)
> > > > @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> > > >    	gpuvm->rb.tree = RB_ROOT_CACHED;
> > > >    	INIT_LIST_HEAD(&gpuvm->rb.list);
> > > > +	INIT_LIST_HEAD(&gpuvm->extobj.list);
> > > > +	spin_lock_init(&gpuvm->extobj.lock);
> > > > +
> > > > +	INIT_LIST_HEAD(&gpuvm->evict.list);
> > > > +	spin_lock_init(&gpuvm->evict.lock);
> > > > +
> > > >    	drm_gpuva_check_overflow(start_offset, range);
> > > >    	gpuvm->mm_start = start_offset;
> > > >    	gpuvm->mm_range = range;
> > > > @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
> > > >    	WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
> > > >    	     "GPUVA tree is not empty, potentially leaking memory.\n");
> > > > +	WARN(!list_empty(&gpuvm->extobj.list), "Extobj list should be empty.\n");
> > > > +	WARN(!list_empty(&gpuvm->evict.list), "Evict list should be empty.\n");
> > > > +
> > > >    	drm_gem_private_object_fini(&gpuvm->d_obj);
> > > >    }
> > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
> > > > +/**
> > > > + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
> > > > + * @gpuvm: the &drm_gpuvm
> > > > + * @exec: the &drm_exec locking context
> > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > + *
> > > > + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
> > > > + * &drm_gpuvm contains mappings of.
> > > > + *
> > > > + * Using this function directly, it is the drivers responsibility to call
> > > > + * drm_exec_init() and drm_exec_fini() accordingly.
> > > > + *
> > > > + * Note: This function is safe against concurrent insertion and removal of
> > > > + * external objects, however it is not safe against concurrent usage itself.
> > > > + *
> > > > + * Drivers need to make sure to protect this case with either an outer VM lock
> > > > + * or by calling drm_gpuvm_prepare_vm() before this function within the
> > > > + * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
> > > > + * mutual exclusion.
> > > > + *
> > > > + * Returns: 0 on success, negative error code on failure.
> > > > + */
> > > > +int
> > > > +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > > > +			  struct drm_exec *exec,
> > > > +			  unsigned int num_fences)
> > > > +{
> > > > +	struct drm_gpuvm_bo *vm_bo;
> > > > +	LIST_HEAD(extobjs);
> > > > +	int ret = 0;
> > > > +
> > > > +	for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
> > > > +		ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
> > > > +		if (ret)
> > > > +			break;
> > > > +	}
> > > > +	/* Drop ref in case we break out of the loop. */
> > > > +	drm_gpuvm_bo_put(vm_bo);
> > > > +	restore_vm_bo_list(gpuvm, extobj, &extobjs);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
> > > > + * @gpuvm: the &drm_gpuvm
> > > > + * @exec: the &drm_exec locking context
> > > > + * @addr: the start address within the VA space
> > > > + * @range: the range to iterate within the VA space
> > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > + *
> > > > + * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
> > > > + * and @addr + @range.
> > > > + *
> > > > + * Returns: 0 on success, negative error code on failure.
> > > > + */
> > > > +int
> > > > +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
> > > > +			u64 addr, u64 range, unsigned int num_fences)
> > > > +{
> > > > +	struct drm_gpuva *va;
> > > > +	u64 end = addr + range;
> > > > +	int ret;
> > > > +
> > > > +	drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
> > > > +		struct drm_gem_object *obj = va->gem.obj;
> > > > +
> > > > +		ret = drm_exec_prepare_obj(exec, obj, num_fences);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
> > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > + * @interruptible: sleep interruptible if waiting
> > > > + *
> > > > + * Acquires all dma-resv locks of all &drm_gem_objects the given
> > > > + * &drm_gpuvm contains mappings of.
> > > > + *
> > > > + * Addionally, when calling this function with struct drm_gpuvm_exec::extra
> > > > + * being set the driver receives the given @fn callback to lock additional
> > > > + * dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
> > > > + * would call drm_exec_prepare_obj() from within this callback.
> > > > + *
> > > > + * Returns: 0 on success, negative error code on failure.
> > > > + */
> > > > +int
> > > > +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > > > +		    unsigned int num_fences,
> > > > +		    bool interruptible)
> > > > +{
> > > > +	struct drm_gpuvm *gpuvm = vm_exec->vm;
> > > > +	struct drm_exec *exec = &vm_exec->exec;
> > > > +	uint32_t flags;
> > > > +	int ret;
> > > > +
> > > > +	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
> > > > +		DRM_EXEC_IGNORE_DUPLICATES;
> > > > +
> > > > +	drm_exec_init(exec, flags);
> > > > +
> > > > +	drm_exec_until_all_locked(exec) {
> > > > +		ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
> > > > +		drm_exec_retry_on_contention(exec);
> > > > +		if (ret)
> > > > +			goto err;
> > > > +
> > > > +		ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
> > > > +		drm_exec_retry_on_contention(exec);
> > > > +		if (ret)
> > > > +			goto err;
> > > > +
> > > > +		if (vm_exec->extra.fn) {
> > > > +			ret = vm_exec->extra.fn(vm_exec, num_fences);
> > > > +			drm_exec_retry_on_contention(exec);
> > > > +			if (ret)
> > > > +				goto err;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +
> > > > +err:
> > > > +	drm_exec_fini(exec);
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
> > > > +
> > > > +static int
> > > > +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
> > > > +{
> > > > +	struct {
> > > > +		struct drm_gem_object **objs;
> > > > +		unsigned int num_objs;
> > > > +	} *args = vm_exec->extra.priv;
> > > > +
> > > > +	return drm_exec_prepare_array(&vm_exec->exec, args->objs,
> > > > +				      args->num_objs, num_fences);
> > > > +}
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
> > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > + * @objs: additional &drm_gem_objects to lock
> > > > + * @num_objs: the number of additional &drm_gem_objects to lock
> > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > + * @interruptible: sleep interruptible if waiting
> > > > + *
> > > > + * Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
> > > > + * contains mappings of, plus the ones given through @objs.
> > > > + *
> > > > + * Returns: 0 on success, negative error code on failure.
> > > > + */
> > > > +int
> > > > +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> > > > +			  struct drm_gem_object **objs,
> > > > +			  unsigned int num_objs,
> > > > +			  unsigned int num_fences,
> > > > +			  bool interruptible)
> > > > +{
> > > > +	struct {
> > > > +		struct drm_gem_object **objs;
> > > > +		unsigned int num_objs;
> > > > +	} args;
> > > > +
> > > > +	args.objs = objs;
> > > > +	args.num_objs = num_objs;
> > > > +
> > > > +	vm_exec->extra.fn = fn_lock_array;
> > > > +	vm_exec->extra.priv = &args;
> > > > +
> > > > +	return drm_gpuvm_exec_lock(vm_exec, num_fences, interruptible);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
> > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > + * @addr: the start address within the VA space
> > > > + * @range: the range to iterate within the VA space
> > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > + * @interruptible: sleep interruptible if waiting
> > > > + *
> > > > + * Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
> > > > + * @addr + @range.
> > > > + *
> > > > + * Returns: 0 on success, negative error code on failure.
> > > > + */
> > > > +int
> > > > +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> > > > +			  u64 addr, u64 range,
> > > > +			  unsigned int num_fences,
> > > > +			  bool interruptible)
> > > > +{
> > > > +	struct drm_gpuvm *gpuvm = vm_exec->vm;
> > > > +	struct drm_exec *exec = &vm_exec->exec;
> > > > +	uint32_t flags;
> > > > +	int ret;
> > > > +
> > > > +	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
> > > > +		DRM_EXEC_IGNORE_DUPLICATES;
> > > > +
> > > > +	drm_exec_init(exec, flags);
> > > > +
> > > > +	drm_exec_until_all_locked(exec) {
> > > > +		ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
> > > > +					      num_fences);
> > > > +		drm_exec_retry_on_contention(exec);
> > > > +		if (ret)
> > > > +			goto err;
> > > > +	}
> > > > +
> > > > +	return ret;
> > > > +
> > > > +err:
> > > > +	drm_exec_fini(exec);
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_validate() - validate all BOs marked as evicted
> > > > + * @gpuvm: the &drm_gpuvm to validate evicted BOs
> > > > + *
> > > > + * Calls the &drm_gpuvm_ops.bo_validate callback for all evicted buffer
> > > > + * objects being mapped in the given &drm_gpuvm.
> > > > + *
> > > > + * Returns: 0 on success, negative error code on failure.
> > > > + */
> > > > +int
> > > > +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
> > > > +{
> > > > +	const struct drm_gpuvm_ops *ops = gpuvm->ops;
> > > > +	struct drm_gpuvm_bo *vm_bo;
> > > > +	LIST_HEAD(evict);
> > > > +	int ret = 0;
> > > > +
> > > > +	if (unlikely(!ops || !ops->bo_validate))
> > > > +		return -ENOTSUPP;
> > > > +
> > > > +	for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
> > > > +		dma_resv_assert_held(vm_bo->obj->resv);
> > > > +		ret = ops->bo_validate(vm_bo->obj);
> > > > +		if (ret)
> > > > +			break;
> > > > +	}
> > > > +	/* Drop ref in case we break out of the loop. */
> > > > +	drm_gpuvm_bo_put(vm_bo);
> > > > +	restore_vm_bo_list(gpuvm, evict, &evict);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_resv_add_fence - add fence to private and all extobj
> > > > + * dma-resv
> > > > + * @gpuvm: the &drm_gpuvm to add a fence to
> > > > + * @exec: the &drm_exec locking context
> > > > + * @fence: fence to add
> > > > + * @private_usage: private dma-resv usage
> > > > + * @extobj_usage: extobj dma-resv usage
> > > > + */
> > > > +void
> > > > +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > > > +			 struct drm_exec *exec,
> > > > +			 struct dma_fence *fence,
> > > > +			 enum dma_resv_usage private_usage,
> > > > +			 enum dma_resv_usage extobj_usage)
> > > > +{
> > > > +	struct drm_gem_object *obj;
> > > > +	unsigned long index;
> > > > +
> > > > +	drm_exec_for_each_locked_object(exec, index, obj) {
> > > > +		dma_resv_assert_held(obj->resv);
> > > > +		dma_resv_add_fence(obj->resv, fence,
> > > > +				   drm_gpuvm_is_extobj(gpuvm, obj) ?
> > > > +				   private_usage : extobj_usage);
> > > > +	}
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
> > > > +
> > > >    /**
> > > >     * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
> > > >     * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> > > > @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
> > > >    	INIT_LIST_HEAD(&vm_bo->list.gpuva);
> > > >    	INIT_LIST_HEAD(&vm_bo->list.entry.gem);
> > > > +	INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
> > > > +	INIT_LIST_HEAD(&vm_bo->list.entry.evict);
> > > > +
> > > >    	drm_gem_object_get(obj);
> > > >    	return vm_bo;
> > > > @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> > > >    	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
> > > > +	spin_lock(&gpuvm->extobj.lock);
> > > > +	list_del(&vm_bo->list.entry.extobj);
> > > > +	spin_unlock(&gpuvm->extobj.lock);
> > > > +
> > > > +	spin_lock(&gpuvm->evict.lock);
> > > > +	list_del(&vm_bo->list.entry.evict);
> > > > +	spin_unlock(&gpuvm->evict.lock);
> > > > +
> > > >    	list_del(&vm_bo->list.entry.gem);
> > > >    	drm_gem_object_put(obj);
> > > > @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> > > >     * @vm_bo: the &drm_gpuvm_bo to release the reference of
> > > >     *
> > > >     * This releases a reference to @vm_bo.
> > > > + *
> > > > + * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
> > > > + * includes removing it from the GEMs gpuva list. Hence, if a call to this
> > > > + * function can potentially let the reference count to zero the caller must
> > > > + * hold the dma-resv or driver specific GEM gpuva lock.
> > > >     */
> > > >    void
> > > >    drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> > > > @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> > > >    }
> > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
> > > > +static int __must_check
> > > > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
> > > > +{
> > > > +	return kref_get_unless_zero(&vm_bo->kref);
> > > > +}
> > > > +
> > > >    static struct drm_gpuvm_bo *
> > > >    __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> > > >    		    struct drm_gem_object *obj)
> > > > @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
> > > >    }
> > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
> > > > +/**
> > > > + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
> > > > + * extobj list
> > > > + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
> > > > + *
> > > > + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
> > > > + * already and if the corresponding &drm_gem_object is an external object,
> > > > + * actually.
> > > > + */
> > > > +void
> > > > +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
> > > > +{
> > > > +	struct drm_gpuvm *gpuvm = vm_bo->vm;
> > > > +
> > > > +	if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
> > > > +		drm_gpuvm_bo_list_add(vm_bo, extobj);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a
> > > > + * &drm_gpuvms evicted list
> > > > + * @obj: the &drm_gem_object to add or remove
> > > > + * @evict: indicates whether the object is evicted
> > > > + *
> > > > + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms evicted
> > > > + * list containing a mapping of this &drm_gem_object.
> > > > + */
> > > > +void
> > > > +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
> > > > +{
> > > > +	struct drm_gpuvm_bo *vm_bo;
> > > > +
> > > > +	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> > > > +		if (evict)
> > > > +			drm_gpuvm_bo_list_add(vm_bo, evict);
> > > > +		else
> > > > +			drm_gpuvm_bo_list_del(vm_bo, evict);
> > > > +	}
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
> > > > +
> > > >    static int
> > > >    __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
> > > >    		   struct drm_gpuva *va)
> > > > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> > > > index afa50b9059a2..834bb6d6617e 100644
> > > > --- a/include/drm/drm_gpuvm.h
> > > > +++ b/include/drm/drm_gpuvm.h
> > > > @@ -26,10 +26,12 @@
> > > >     */
> > > >    #include <linux/list.h>
> > > > +#include <linux/dma-resv.h>
> > > >    #include <linux/rbtree.h>
> > > >    #include <linux/types.h>
> > > >    #include <drm/drm_gem.h>
> > > > +#include <drm/drm_exec.h>
> > > >    struct drm_gpuvm;
> > > >    struct drm_gpuvm_bo;
> > > > @@ -259,6 +261,38 @@ struct drm_gpuvm {
> > > >    	 * space
> > > >    	 */
> > > >    	struct dma_resv *resv;
> > > > +
> > > > +	/**
> > > > +	 * @extobj: structure holding the extobj list
> > > > +	 */
> > > > +	struct {
> > > > +		/**
> > > > +		 * @list: &list_head storing &drm_gpuvm_bos serving as
> > > > +		 * external object
> > > > +		 */
> > > > +		struct list_head list;
> > > > +
> > > > +		/**
> > > > +		 * @lock: spinlock to protect the extobj list
> > > > +		 */
> > > > +		spinlock_t lock;
> > > > +	} extobj;
> > > > +
> > > > +	/**
> > > > +	 * @evict: structure holding the evict list and evict list lock
> > > > +	 */
> > > > +	struct {
> > > > +		/**
> > > > +		 * @list: &list_head storing &drm_gpuvm_bos currently being
> > > > +		 * evicted
> > > > +		 */
> > > > +		struct list_head list;
> > > > +
> > > > +		/**
> > > > +		 * @lock: spinlock to protect the evict list
> > > > +		 */
> > > > +		spinlock_t lock;
> > > > +	} evict;
> > > >    };
> > > >    void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> > > > @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> > > >    		    const struct drm_gpuvm_ops *ops);
> > > >    void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
> > > > +/**
> > > > + * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
> > > > + * external object
> > > > + * @gpuvm: the &drm_gpuvm to check
> > > > + * @obj: the &drm_gem_object to check
> > > > + *
> > > > + * Returns: true if the &drm_gem_object &dma_resv differs from the
> > > > + * &drm_gpuvms &dma_resv, false otherwise
> > > > + */
> > > > +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
> > > > +				       struct drm_gem_object *obj)
> > > > +{
> > > > +	return obj && obj->resv != gpuvm->resv;
> > > > +}
> > > > +
> > > >    static inline struct drm_gpuva *
> > > >    __drm_gpuva_next(struct drm_gpuva *va)
> > > >    {
> > > > @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
> > > >    #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
> > > >    	list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)
> > > > +/**
> > > > + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
> > > > + *
> > > > + * This structure should be created on the stack as &drm_exec should be.
> > > > + *
> > > > + * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
> > > > + */
> > > > +struct drm_gpuvm_exec {
> > > > +	/**
> > > > +	 * @exec: the &drm_exec structure
> > > > +	 */
> > > > +	struct drm_exec exec;
> > > > +
> > > > +	/**
> > > > +	 * @vm: the &drm_gpuvm to lock its DMA reservations
> > > > +	 */
> > > > +	struct drm_gpuvm *vm;
> > > > +
> > > > +	/**
> > > > +	 * @extra: Callback and corresponding private data for the driver to
> > > > +	 * lock arbitrary additional &drm_gem_objects.
> > > > +	 */
> > > > +	struct {
> > > > +		/**
> > > > +		 * @fn: The driver callback to lock additional &drm_gem_objects.
> > > > +		 */
> > > > +		int (*fn)(struct drm_gpuvm_exec *vm_exec,
> > > > +			  unsigned int num_fences);
> > > > +
> > > > +		/**
> > > > +		 * @priv: driver private data for the @fn callback
> > > > +		 */
> > > > +		void *priv;
> > > > +	} extra;
> > > > +};
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
> > > > + * @gpuvm: the &drm_gpuvm
> > > > + * @exec: the &drm_exec context
> > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > + *
> > > > + * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
> > > > + *
> > > > + * Using this function directly, it is the drivers responsibility to call
> > > > + * drm_exec_init() and drm_exec_fini() accordingly.
> > > > + *
> > > > + * Returns: 0 on success, negative error code on failure.
> > > > + */
> > > > +static inline int
> > > > +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
> > > > +		     struct drm_exec *exec,
> > > > +		     unsigned int num_fences)
> > > > +{
> > > > +	return drm_exec_prepare_obj(exec, &gpuvm->d_obj, num_fences);
> > > > +}
> > > > +
> > > > +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > > > +			      struct drm_exec *exec,
> > > > +			      unsigned int num_fences);
> > > > +
> > > > +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
> > > > +			    struct drm_exec *exec,
> > > > +			    u64 addr, u64 range,
> > > > +			    unsigned int num_fences);
> > > > +
> > > > +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > > > +			unsigned int num_fences,
> > > > +			bool interruptible);
> > > > +
> > > > +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> > > > +			      struct drm_gem_object **objs,
> > > > +			      unsigned int num_objs,
> > > > +			      unsigned int num_fences,
> > > > +			      bool interruptible);
> > > > +
> > > > +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> > > > +			      u64 addr, u64 range,
> > > > +			      unsigned int num_fences,
> > > > +			      bool interruptible);
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
> > > > + * @gpuvm: the &drm_gpuvm
> > > > + *
> > > > + * Releases all dma-resv locks of all &drm_gem_objects previously acquired
> > > > + * through drm_gpuvm_lock() or its variants.
> > > > + *
> > > > + * Returns: 0 on success, negative error code on failure.
> > > > + */
> > > > +static inline void
> > > > +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> > > > +{
> > > > +	drm_exec_fini(&vm_exec->exec);
> > > > +}
> > > > +
> > > > +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
> > > > +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > > > +			      struct drm_exec *exec,
> > > > +			      struct dma_fence *fence,
> > > > +			      enum dma_resv_usage private_usage,
> > > > +			      enum dma_resv_usage extobj_usage);
> > > > +
> > > > +/**
> > > > + * drm_gpuvm_exec_resv_add_fence()
> > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > + * @fence: fence to add
> > > > + * @private_usage: private dma-resv usage
> > > > + * @extobj_usage: extobj dma-resv usage
> > > > + *
> > > > + * See drm_gpuvm_resv_add_fence().
> > > > + */
> > > > +static inline void
> > > > +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
> > > > +			      struct dma_fence *fence,
> > > > +			      enum dma_resv_usage private_usage,
> > > > +			      enum dma_resv_usage extobj_usage)
> > > > +{
> > > > +	drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
> > > > +				 private_usage, extobj_usage);
> > > > +}
> > > > +
> > > >    /**
> > > >     * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
> > > >     * &drm_gem_object combination
> > > > @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
> > > >    			 * gpuva list.
> > > >    			 */
> > > >    			struct list_head gem;
> > > > +
> > > > +			/**
> > > > +			 * @evict: List entry to attach to the &drm_gpuvms
> > > > +			 * extobj list.
> > > > +			 */
> > > > +			struct list_head extobj;
> > > > +
> > > > +			/**
> > > > +			 * @evict: List entry to attach to the &drm_gpuvms evict
> > > > +			 * list.
> > > > +			 */
> > > > +			struct list_head evict;
> > > >    		} entry;
> > > >    	} list;
> > > >    };
> > > > @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
> > > >    drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> > > >    		  struct drm_gem_object *obj);
> > > > +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
> > > > +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
> > > > +
> > > >    /**
> > > >     * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
> > > >     * @va__: &drm_gpuva structure to assign to in each iteration step
> > > > @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
> > > >    	 * used.
> > > >    	 */
> > > >    	int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
> > > > +
> > > > +	/**
> > > > +	 * @bo_validate: called from drm_gpuvm_validate()
> > > > +	 *
> > > > +	 * Drivers receive this callback for every evicted &drm_gem_object being
> > > > +	 * mapped in the corresponding &drm_gpuvm.
> > > > +	 *
> > > > +	 * Typically, drivers would call their driver specific variant of
> > > > +	 * ttm_bo_validate() from within this callback.
> > > > +	 */
> > > > +	int (*bo_validate)(struct drm_gem_object *obj);
> > > >    };
> > > >    int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>
Boris Brezillon Sept. 13, 2023, 7:03 a.m. UTC | #11
On Tue, 12 Sep 2023 18:20:32 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> > +/**
> > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > + * @__gpuvm: The GPU VM
> > + * @__list_name: The name of the list we're iterating on
> > + * @__local_list: A pointer to the local list used to store already iterated items
> > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> > + *
> > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > + * iterator releases the lock immediately after picking the first element from
> > + * the list, so list insertion deletion can happen concurrently.  
> 
> Are the list spinlocks needed for that async state update from within 
> the dma-fence critical section we've discussed previously?

Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
hook will be in this situation (Panthor at the moment, PowerVR soon). I
get that Xe and Nouveau don't need that because they update the VM
state early (in the ioctl path), but I keep thinking this will hurt us
if we don't think it through from the beginning, because once you've
set this logic to depend only on resv locks, it will be pretty hard to
get back to a solution which lets synchronous VM_BINDs take precedence
on asynchronous request, and, with vkQueueBindSparse() passing external
deps (plus the fact the VM_BIND queue might be pretty deep), it can
take a long time to get your synchronous VM_BIND executed...

Now, maybe the solution is something different, with early VM state
update for everyone (creation of to-be-[un]mapped drm_gpuva entries,
some of them being shadowed by already existing drm_gpuva that's
encoding the currently mapped region), and VM state patching when a
synchronous VM_BIND kicks in (we need to patch the previously queued
requests too, so they always have enough resources for the map/unmap
operations to succeed).
Dave Airlie Sept. 13, 2023, 7:05 a.m. UTC | #12
On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Tue, 12 Sep 2023 18:20:32 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
> > > +/**
> > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > + * @__gpuvm: The GPU VM
> > > + * @__list_name: The name of the list we're iterating on
> > > + * @__local_list: A pointer to the local list used to store already iterated items
> > > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> > > + *
> > > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > > + * iterator releases the lock immediately after picking the first element from
> > > + * the list, so list insertion deletion can happen concurrently.
> >
> > Are the list spinlocks needed for that async state update from within
> > the dma-fence critical section we've discussed previously?
>
> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> hook will be in this situation (Panthor at the moment, PowerVR soon). I
> get that Xe and Nouveau don't need that because they update the VM
> state early (in the ioctl path), but I keep thinking this will hurt us
> if we don't think it through from the beginning, because once you've
> set this logic to depend only on resv locks, it will be pretty hard to
> get back to a solution which lets synchronous VM_BINDs take precedence
> on asynchronous request, and, with vkQueueBindSparse() passing external
> deps (plus the fact the VM_BIND queue might be pretty deep), it can
> take a long time to get your synchronous VM_BIND executed...

btw what is the use case for this? do we have actual vulkan
applications we know will have problems here?

it feels like a bit of premature optimisation, but maybe we have use cases.

Dave.
Boris Brezillon Sept. 13, 2023, 7:19 a.m. UTC | #13
On Wed, 13 Sep 2023 17:05:42 +1000
Dave Airlie <airlied@gmail.com> wrote:

> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Tue, 12 Sep 2023 18:20:32 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> > > > +/**
> > > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > > + * @__gpuvm: The GPU VM
> > > > + * @__list_name: The name of the list we're iterating on
> > > > + * @__local_list: A pointer to the local list used to store already iterated items
> > > > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> > > > + *
> > > > + * This helper is here to provide lockless list iteration. Lockless as in, the
> > > > + * iterator releases the lock immediately after picking the first element from
> > > > + * the list, so list insertion deletion can happen concurrently.  
> > >
> > > Are the list spinlocks needed for that async state update from within
> > > the dma-fence critical section we've discussed previously?  
> >
> > Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> > hook will be in this situation (Panthor at the moment, PowerVR soon). I
> > get that Xe and Nouveau don't need that because they update the VM
> > state early (in the ioctl path), but I keep thinking this will hurt us
> > if we don't think it through from the beginning, because once you've
> > set this logic to depend only on resv locks, it will be pretty hard to
> > get back to a solution which lets synchronous VM_BINDs take precedence
> > on asynchronous request, and, with vkQueueBindSparse() passing external
> > deps (plus the fact the VM_BIND queue might be pretty deep), it can
> > take a long time to get your synchronous VM_BIND executed...  
> 
> btw what is the use case for this? do we have actual vulkan
> applications we know will have problems here?

I don't, but I think that's a concern Faith raised at some point (dates
back from when I was reading threads describing how VM_BIND on i915
should work, and I was clearly discovering this whole VM_BIND thing at
that time, so maybe I misunderstood).

> 
> it feels like a bit of premature optimisation, but maybe we have use cases.

Might be, but that's the sort of thing that would put us in a corner if
we don't have a plan for when the needs arise. Besides, if we don't
want to support that case because it's too complicated, I'd recommend
dropping all the drm_gpuvm APIs that let people think this mode is
valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
confusion.
Thomas Hellstrom Sept. 13, 2023, 9:14 a.m. UTC | #14
Hi!

On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
> > 
> > On 9/12/23 18:50, Danilo Krummrich wrote:
> > > On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
> > > > Hi, Danilo,
> > > > 
> > > > On 9/9/23 17:31, Danilo Krummrich wrote:
> > > > > So far the DRM GPUVA manager offers common infrastructure to
> > > > > track GPU VA
> > > > > allocations and mappings, generically connect GPU VA mappings
> > > > > to their
> > > > > backing buffers and perform more complex mapping operations
> > > > > on the GPU VA
> > > > > space.
> > > > > 
> > > > > However, there are more design patterns commonly used by
> > > > > drivers, which
> > > > > can potentially be generalized in order to make the DRM GPUVA
> > > > > manager
> > > > > represent a basic GPU-VM implementation. In this context,
> > > > > this patch aims
> > > > > at generalizing the following elements.
> > > > > 
> > > > > 1) Provide a common dma-resv for GEM objects not being used
> > > > > outside of
> > > > >      this GPU-VM.
> > > > > 
> > > > > 2) Provide tracking of external GEM objects (GEM objects
> > > > > which are
> > > > >      shared with other GPU-VMs).
> > > > > 
> > > > > 3) Provide functions to efficiently lock all GEM objects dma-
> > > > > resv the
> > > > >      GPU-VM contains mappings of.
> > > > > 
> > > > > 4) Provide tracking of evicted GEM objects the GPU-VM
> > > > > contains mappings
> > > > >      of, such that validation of evicted GEM objects is
> > > > > accelerated.
> > > > > 
> > > > > 5) Provide some convinience functions for common patterns.
> > > > > 
> > > > > Rather than being designed as a "framework", the target is to
> > > > > make all
> > > > > features appear as a collection of optional helper functions,
> > > > > such that
> > > > > drivers are free to make use of the DRM GPUVA managers basic
> > > > > functionality and opt-in for other features without setting
> > > > > any feature
> > > > > flags, just by making use of the corresponding functions.
> > > > > 
> > > > > Big kudos to Boris Brezillon for his help to figure out
> > > > > locking for drivers
> > > > > updating the GPU VA space within the fence signalling path.
> > > > > 
> > > > > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > > > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > > > ---
> > > > >    drivers/gpu/drm/drm_gpuvm.c | 516
> > > > > ++++++++++++++++++++++++++++++++++++
> > > > >    include/drm/drm_gpuvm.h     | 197 ++++++++++++++
> > > > >    2 files changed, 713 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c
> > > > > b/drivers/gpu/drm/drm_gpuvm.c
> > > > > index f4411047dbb3..8e62a043f719 100644
> > > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > > @@ -73,6 +73,21 @@
> > > > >     * &drm_gem_object list of &drm_gpuvm_bos for an existing
> > > > > instance of this
> > > > >     * particular combination. If not existent a new instance
> > > > > is created and linked
> > > > >     * to the &drm_gem_object.
> > > > > + *
> > > > > + * &drm_gpuvm_bo structures, since unique for a given
> > > > > &drm_gpuvm, are also used
> > > > > + * as entry for the &drm_gpuvm's lists of external and
> > > > > evicted objects. Those
> > > > > + * list are maintained in order to accelerate locking of
> > > > > dma-resv locks and
> > > > > + * validation of evicted objects bound in a &drm_gpuvm. For
> > > > > instance the all
> > > > > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
> > > > > locked by calling
> > > > > + * drm_gpuvm_exec_lock(). Once locked drivers can call
> > > > > drm_gpuvm_validate() in
> > > > > + * order to validate all evicted &drm_gem_objects. It is
> > > > > also possible to lock
> > > > > + * additional &drm_gem_objects by providing the
> > > > > corresponding parameters to
> > > > > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
> > > > > loop while making
> > > > > + * use of helper functions such as drm_gpuvm_prepare_range()
> > > > > or
> > > > > + * drm_gpuvm_prepare_objects().
> > > > > + *
> > > > > + * Every bound &drm_gem_object is treated as external object
> > > > > when its &dma_resv
> > > > > + * structure is different than the &drm_gpuvm's common
> > > > > &dma_resv structure.
> > > > >     */
> > > > >    /**
> > > > > @@ -420,6 +435,20 @@
> > > > >     * Subsequent calls to drm_gpuvm_bo_obtain() for the same
> > > > > &drm_gpuvm and
> > > > >     * &drm_gem_object must be able to observe previous
> > > > > creations and destructions
> > > > >     * of &drm_gpuvm_bos in order to keep instances unique.
> > > > > + *
> > > > > + * The &drm_gpuvm's lists for keeping track of external and
> > > > > evicted objects are
> > > > > + * protected against concurrent insertion / removal and
> > > > > iteration internally.
> > > > > + *
> > > > > + * However, drivers still need ensure to protect concurrent
> > > > > calls to functions
> > > > > + * iterating those lists, such as drm_gpuvm_validate() and
> > > > > + * drm_gpuvm_prepare_objects(). Every such function contains
> > > > > a particular
> > > > > + * comment and lockdep checks if possible.
> > > > > + *
> > > > > + * Functions adding or removing entries from those lists,
> > > > > such as
> > > > > + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
> > > > > called with external
> > > > > + * locks being held, e.g. in order to avoid the
> > > > > corresponding list to be
> > > > > + * (safely) modified while potentially being iternated by
> > > > > other API functions.
> > > > > + * However, this is entirely optional.
> > > > >     */
> > > > >    /**
> > > > > @@ -632,6 +661,131 @@
> > > > >     *   }
> > > > >     */
> > > > > +/**
> > > > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > > > + * @__gpuvm: The GPU VM
> > > > > + * @__list_name: The name of the list we're iterating on
> > > > > + * @__local_list: A pointer to the local list used to store
> > > > > already iterated items
> > > > > + * @__prev_vm_bo: The previous element we got from
> > > > > drm_gpuvm_get_next_cached_vm_bo()
> > > > > + *
> > > > > + * This helper is here to provide lockless list iteration.
> > > > > Lockless as in, the
> > > > > + * iterator releases the lock immediately after picking the
> > > > > first element from
> > > > > + * the list, so list insertion deletion can happen
> > > > > concurrently.
> > > > Are the list spinlocks needed for that async state update from
> > > > within the
> > > > dma-fence critical section we've discussed previously?
> > > Yes, but also for other reasons, see below.
> > > 
> > > > Otherwise it should be sufficient to protect the lists with the
> > > > gpuvm's resv
> > > > (or for the extobj list with an outer lock).
> > > > 
> > > > If those spinlocks are still needed in some situations, perhaps
> > > > could we
> > > > have an option to set them to NULL (Like IIRC the maple tree
> > > > allows for)?
> > > The evict spinlock is needed in any case, since in
> > > drm_gpuvm_bo_evict() we're
> > > holding only the dma-resv lock from the BO this function gets
> > > called for. Hence,
> > > the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
> > > different BOs.
> > No. Only if you try to add external objects to the vm's evict list
> > from
> > within the evict code. That's not necessary since you loop through
> > all
> > external objects anyway when locking them so an "evicted" bool in
> > the vm_bo,
> > protected by the bo resv would be sufficient. The extobj locking
> > loop can
> > then add the bo to the evicted list.
> 
> And validate() can remove it while still holding all dma-resv locks,
> neat!
> However, what if two tasks are trying to lock the VA space
> concurrently? What
> do we do when the drm_gpuvm_bo's refcount drops to zero in
> drm_gpuva_unlink()?
> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
> on the
> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
> with the
> dma-resv lock held, which wouldn't be allowed, since
> drm_gpuvm_bo_destroy()
> might drop the last reference to the drm_gem_object and hence we'd
> potentially
> free the dma-resv lock while holding it, at least if it's an external
> object.

Easiest way in this scheme is to think of the lists as being protected
by the vm's resv lock. That means anybody calling unlink() must also
hold the vm's resv lock. (Which is OK from an UAF point of view, but
perhaps not from a locking inversion POW from an async list update).

> 
> > > 
> > > For extobjs an outer lock would be enough in case of Xe, but I
> > > really would not
> > > like to add even more complexity just to get the spinlock out of
> > > the way in case
> > > the driver already has an outer lock protecting this path.
> > 
> > I must disagree here. These spinlocks and atomic operations are
> > pretty
> > costly and as discussed earlier this type of locking was the reason
> > (at
> > least according to the commit message) that made Christian drop the
> > XArray
> > use in drm_exec for the same set of objects: "The locking overhead
> > is
> > unecessary and measurable". IMHO the spinlock is the added
> > complexity and a
> > single wide lock following the drm locking guidelines set out by
> > Daniel and
> > David should really be the default choice with an opt-in for a
> > spinlock if
> > needed for async and pushing out to a wq is not an option.
> 
> For the external object list an outer lock would work as long as it's
> not the
> dma-resv lock of the corresponding GEM object, since here we actually
> need to
> remove the list entry from the external object list on
> drm_gpuvm_bo_destroy().
> It's just a bit weird design wise that drivers would need to take
> this outer
> lock on:
> 
> - drm_gpuvm_bo_extobj_add()
> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
> - drm_gpuva_unlink()            (because it needs to call
> drm_gpuvm_bo_put())
> - drm_gpuvm_exec_lock()
> - drm_gpuvm_exec_lock_array()
> - drm_gpuvm_prepare_range()
> 
> Given that it seems reasonable to do all the required locking
> internally.

From a design POW, there has been a clear direction in XE to make
things similar to mmap() / munmap(), so this outer lock, which in Xe is
an rwsem, is used in a similar way as the mmap_lock. It's protecting
the page-table structures and vma rb tree, the userptr structures and
the extobj list. Basically it's taken early in the exec IOCTL, the
VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
all of the above are just asserting that it is taken in the correct
mode.

But strictly with this scheme one could also use the vm's dma_resv for
the extobj list since with drm_exec, it's locked before traversing the
list.

The whole point of this scheme is to rely on locks that you already are
supposed to be holding for various reasons and is simple to comprehend.

> 
> In order to at least place lockdep checks, the driver would need to
> supply the
> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
> know about
> the lock.

Yes, that sounds reasonable. One lockdep map per list.

> 
> Out of curiosity, what is the overhead of a spin_lock() that doesn't
> need to
> spin? 

I guess it's hard to tell exactly, but it is much lower on modern x86
than what it used to be. Not sure about ARM, which is the other
architecture important to us. I figure if there is little cache-line
bouncing the main overhead comes from the implied barriers.

> 
> > 
> > A pretty simple way that would not add much code would be
> > 
> > static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
> > spinlock_t
> > *lock)
> > 
> > {
> > 
> >     if (!gpuvm->resv_protected_lists)
> >         spin_lock(lock);
> > 
> > }
> > 
> > > > For such drivers, that would require anybody calling unlink to
> > > > hold the vm's
> > > > resv, though.
> > > In V4 I want to go back to having a dedicated lock for the GEMs
> > > gpuva list (or
> > > VM_BO list to be more precise). We can't just use the dma-resv
> > > lock for that
> > > with VM_BO abstractions, because on destruction of a VM_BO we
> > > otherwise wouldn't
> > > be allowed to already hold the dma-resv lock. That's the fix I
> > > was referring to
> > > earlier.
> > 
> > Yeah, I can see the need for a dedicated lock for the GEM's gpuva
> > list, but
> > holding the vm's dma-resv lock across the unlink shouldn't be a
> > problem. We
> > may free the object and a pointer to the vm's resv during unlink
> > but we
> > don't free the vm's resv.  It'd be a matter of ensuring that any
> > calls to
> > unlink from *within* drm_gpuvm allows it to be held.
> 
> Drivers calling unlink() from the fence signaling path can't use the
> VM's
> dma-resv lock.

Yes, that made me a bit curious because in the current version the code
required the object's dma_resv for unlink() which can't be grabbed
either from the fence signaling path. So are there any drivers actually
wanting to do that? If so, they will either need to resort to the
current spinlock solution or they will need to call unlink from a
workqueue item. 
> 
> Also, what if the object is an external object? We can't use the VM's
> dma-resv
> lock here.

Why? Typically (sync) unlink is only ever called from an unbind-like
operation where it should be trivial to grab the vm's resv. Or, for
that matter any outer lock protecting the extobj list. Rule would be
the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
be protected by either the vm's dma_resv (or possibly an outer lock in
the case of the extobj list).

>  And we can't have the GEM objs dma-resv lock held when calling
> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
> refcount drops
> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
> drop the
> last reference of the GEM object.

Yes, but this is a different problem as to what exactly protects
drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
lock, or if we want to keep the bo's dma_resv we need to ensure that
the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
Boris didn't like that, but requiring an explicit refcount for a
pointer you dereference unless you're under a lock that ensures keeping
the object alive is pretty much required?) But anyway for the
drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
I don't have a strong preference.

>  All those problems go away with a dedicated
> GEM gpuva list lock.

I don't think these are real problems.
With the excepton of the eviction list "trick" where we currently have
slightly different approach to collect external bos needing rebinding,
we have this working fine.

TBH I think pretty much the only situation where the spinlock is needed
is for async updates of these lists, unless a wq item can be used for
that, but it doesn't really seem like the current code allows for such
updates anyway? It complicates the code a lot, adds overhead and also
adds the requirement for refcounting during list traversal.

/Thomas

> 
> > 
> > /Thomas
> > 
> > 
> > > > It seems that with that also the refcount could be make non-
> > > > atomic.
> > > > 
> > > > All in the spirit of the drm locking guidelines "use big locks
> > > > when
> > > > possible".
> > > > Lower level locks only when necessary for performance or
> > > > locking inversion?
> > > > 
> > > > /Thomas
> > > > 
> > > > 
> > > > > + *
> > > > > + * Elements popped from the original list are kept in a
> > > > > local list, so removal
> > > > > + * and is_empty checks can still happen while we're
> > > > > iterating the list.
> > > > > + */
> > > > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
> > > > > __local_list, __prev_vm_bo)     \
> > > > > +       ({                                                   
> > > > >                            \
> > > > > +               struct drm_gpuvm_bo
> > > > > *__vm_bo;                                           \
> > > > > +                                                            
> > > > >                            \
> > > > > +               drm_gpuvm_bo_put(__prev_vm_bo);              
> > > > >                            \
> > > > > +                                                            
> > > > >                            \
> > > > > +               spin_lock(&(__gpuvm)-
> > > > > >__list_name.lock);                                \
> > > > > +               while (!list_empty(&(__gpuvm)-
> > > > > >__list_name.list)) {                     \
> > > > > +                       __vm_bo =
> > > > > list_first_entry(&(__gpuvm)->__list_name.list,        \
> > > > > +                                                  struct
> > > > > drm_gpuvm_bo,                 \
> > > > > +                                                 
> > > > > list.entry.__list_name);             \
> > > > > +                       if
> > > > > (drm_gpuvm_bo_get_unless_zero(__vm_bo))
> > > > > {                    \
> > > > > +                               list_move_tail(&(__vm_bo)-
> > > > > >list.entry.__list_name,      \
> > > > > +                                             
> > > > > __local_list);                           \
> > > > > +                               break;                       
> > > > >                            \
> > > > > +                       } else
> > > > > {                                                        \
> > > > > +                               list_del_init(&(__vm_bo)-
> > > > > >list.entry.__list_name);      \
> > > > > +                               __vm_bo =
> > > > > NULL;                                         \
> > > > > +                       }                                    
> > > > >                            \
> > > > > +               }                                            
> > > > >                            \
> > > > > +               spin_unlock(&(__gpuvm)-
> > > > > >__list_name.lock);                              \
> > > > > +                                                            
> > > > >                            \
> > > > > +               __vm_bo;                                     
> > > > >                            \
> > > > > +       })
> > > > > +
> > > > > +/**
> > > > > + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> > > > > + *
> > > > > + * This helper is here to provide lockless list iteration.
> > > > > Lockless as in, the
> > > > > + * iterator releases the lock immediately after picking the
> > > > > first element from the
> > > > > + * list, so list insertion and deletion can happen
> > > > > concurrently.
> > > > > + *
> > > > > + * Typical use:
> > > > > + *
> > > > > + *     struct drm_gpuvm_bo *vm_bo;
> > > > > + *     LIST_HEAD(my_local_list);
> > > > > + *
> > > > > + *     ret = 0;
> > > > > + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
> > > > > &my_local_list, vm_bo) {
> > > > > + *             ret = do_something_with_vm_bo(..., vm_bo);
> > > > > + *             if (ret)
> > > > > + *                     break;
> > > > > + *     }
> > > > > + *     drm_gpuvm_bo_put(vm_bo);
> > > > > + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
> > > > > &my_local_list);
> > > > > + *
> > > > > + *
> > > > > + * Only used for internal list iterations, not meant to be
> > > > > exposed to the outside
> > > > > + * world.
> > > > > + */
> > > > > +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
> > > > > __local_list, __vm_bo)    \
> > > > > +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
> > > > > __list_name,           \
> > > > > +                                               __local_list,
> > > > > NULL);            \
> > > > > +           
> > > > > __vm_bo;                                                     
> > > > >       \
> > > > > +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
> > > > > __list_name,           \
> > > > > +                                               __local_list,
> > > > > __vm_bo))         \
> > > > > +
> > > > > +/**
> > > > > + * restore_vm_bo_list() - move vm_bo elements back to their
> > > > > original list
> > > > > + * @__gpuvm: The GPU VM
> > > > > + * @__list_name: The name of the list we're iterating on
> > > > > + * @__local_list: A pointer to the local list used to store
> > > > > already iterated items
> > > > > + *
> > > > > + * When we're done iterating a vm_bo list, we should call
> > > > > restore_vm_bo_list()
> > > > > + * to restore the original state and let new iterations take
> > > > > place.
> > > > > + */
> > > > > +#define restore_vm_bo_list(__gpuvm, __list_name,
> > > > > __local_list)                         \
> > > > > +       do
> > > > > {                                                            
> > > > >                 \
> > > > > +               /* Merge back the two lists, moving local
> > > > > list elements to the          \
> > > > > +                * head to preserve previous ordering, in
> > > > > case it matters.              \
> > > > > +               
> > > > > */                                                           
> > > > >           \
> > > > > +               spin_lock(&(__gpuvm)-
> > > > > >__list_name.lock);                                \
> > > > > +               list_splice(__local_list, &(__gpuvm)-
> > > > > >__list_name.list);                \
> > > > > +               spin_unlock(&(__gpuvm)-
> > > > > >__list_name.lock);                              \
> > > > > +       } while (0)
> > > > > +/**
> > > > > + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
> > > > > list
> > > > > + * @__vm_bo: the &drm_gpuvm_bo
> > > > > + * @__list_name: the name of the list to insert into
> > > > > + *
> > > > > + * Inserts the given @__vm_bo into the list specified by
> > > > > @__list_name and
> > > > > + * increases the vm_bo's reference count.
> > > > > + */
> > > > > +#define drm_gpuvm_bo_list_add(__vm_bo,
> > > > > __list_name)                            \
> > > > > +       do
> > > > > {                                                            
> > > > >         \
> > > > > +               spin_lock(&(__vm_bo)->vm-
> > > > > >__list_name.lock);                    \
> > > > > +               if (list_empty(&(__vm_bo)-
> > > > > >list.entry.__list_name))             \
> > > > > +                       list_add_tail(&(__vm_bo)-
> > > > > >list.entry.__list_name,       \
> > > > > +                                     &(__vm_bo)->vm-
> > > > > >__list_name.list);        \
> > > > > +               spin_unlock(&(__vm_bo)->vm-
> > > > > >__list_name.lock);                  \
> > > > > +       } while (0)
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
> > > > > list
> > > > > + * @__vm_bo: the &drm_gpuvm_bo
> > > > > + * @__list_name: the name of the list to insert into
> > > > > + *
> > > > > + * Removes the given @__vm_bo from the list specified by
> > > > > @__list_name and
> > > > > + * decreases the vm_bo's reference count.
> > > > > + */
> > > > > +#define drm_gpuvm_bo_list_del(__vm_bo,
> > > > > __list_name)                            \
> > > > > +       do
> > > > > {                                                            
> > > > >         \
> > > > > +               spin_lock(&(__vm_bo)->vm-
> > > > > >__list_name.lock);                    \
> > > > > +               if (!list_empty(&(__vm_bo)-
> > > > > >list.entry.__list_name))            \
> > > > > +                       list_del_init(&(__vm_bo)-
> > > > > >list.entry.__list_name);      \
> > > > > +               spin_unlock(&(__vm_bo)->vm-
> > > > > >__list_name.lock);                  \
> > > > > +       } while (0)
> > > > > +
> > > > > +static int __must_check
> > > > > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
> > > > > +
> > > > >    #define to_drm_gpuva(__node) container_of((__node), struct
> > > > > drm_gpuva, rb.node)
> > > > >    #define GPUVA_START(node) ((node)->va.addr)
> > > > > @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
> > > > > struct drm_device *drm,
> > > > >         gpuvm->rb.tree = RB_ROOT_CACHED;
> > > > >         INIT_LIST_HEAD(&gpuvm->rb.list);
> > > > > +       INIT_LIST_HEAD(&gpuvm->extobj.list);
> > > > > +       spin_lock_init(&gpuvm->extobj.lock);
> > > > > +
> > > > > +       INIT_LIST_HEAD(&gpuvm->evict.list);
> > > > > +       spin_lock_init(&gpuvm->evict.lock);
> > > > > +
> > > > >         drm_gpuva_check_overflow(start_offset, range);
> > > > >         gpuvm->mm_start = start_offset;
> > > > >         gpuvm->mm_range = range;
> > > > > @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
> > > > > *gpuvm)
> > > > >         WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
> > > > >              "GPUVA tree is not empty, potentially leaking
> > > > > memory.\n");
> > > > > +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
> > > > > should be empty.\n");
> > > > > +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
> > > > > should be empty.\n");
> > > > > +
> > > > >         drm_gem_private_object_fini(&gpuvm->d_obj);
> > > > >    }
> > > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
> > > > > +/**
> > > > > + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
> > > > > + * @gpuvm: the &drm_gpuvm
> > > > > + * @exec: the &drm_exec locking context
> > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > + *
> > > > > + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
> > > > > given
> > > > > + * &drm_gpuvm contains mappings of.
> > > > > + *
> > > > > + * Using this function directly, it is the drivers
> > > > > responsibility to call
> > > > > + * drm_exec_init() and drm_exec_fini() accordingly.
> > > > > + *
> > > > > + * Note: This function is safe against concurrent insertion
> > > > > and removal of
> > > > > + * external objects, however it is not safe against
> > > > > concurrent usage itself.
> > > > > + *
> > > > > + * Drivers need to make sure to protect this case with
> > > > > either an outer VM lock
> > > > > + * or by calling drm_gpuvm_prepare_vm() before this function
> > > > > within the
> > > > > + * drm_exec_until_all_locked() loop, such that the GPUVM's
> > > > > dma-resv lock ensures
> > > > > + * mutual exclusion.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +int
> > > > > +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > > > > +                         struct drm_exec *exec,
> > > > > +                         unsigned int num_fences)
> > > > > +{
> > > > > +       struct drm_gpuvm_bo *vm_bo;
> > > > > +       LIST_HEAD(extobjs);
> > > > > +       int ret = 0;
> > > > > +
> > > > > +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
> > > > > vm_bo) {
> > > > > +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
> > > > > num_fences);
> > > > > +               if (ret)
> > > > > +                       break;
> > > > > +       }
> > > > > +       /* Drop ref in case we break out of the loop. */
> > > > > +       drm_gpuvm_bo_put(vm_bo);
> > > > > +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
> > > > > +
> > > > > +       return ret;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
> > > > > a given range
> > > > > + * @gpuvm: the &drm_gpuvm
> > > > > + * @exec: the &drm_exec locking context
> > > > > + * @addr: the start address within the VA space
> > > > > + * @range: the range to iterate within the VA space
> > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > + *
> > > > > + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
> > > > > mapped between @addr
> > > > > + * and @addr + @range.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +int
> > > > > +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
> > > > > drm_exec *exec,
> > > > > +                       u64 addr, u64 range, unsigned int
> > > > > num_fences)
> > > > > +{
> > > > > +       struct drm_gpuva *va;
> > > > > +       u64 end = addr + range;
> > > > > +       int ret;
> > > > > +
> > > > > +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
> > > > > +               struct drm_gem_object *obj = va->gem.obj;
> > > > > +
> > > > > +               ret = drm_exec_prepare_obj(exec, obj,
> > > > > num_fences);
> > > > > +               if (ret)
> > > > > +                       return ret;
> > > > > +       }
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_exec_lock() - lock all dma-resv of all
> > > > > assoiciated BOs
> > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > + * @interruptible: sleep interruptible if waiting
> > > > > + *
> > > > > + * Acquires all dma-resv locks of all &drm_gem_objects the
> > > > > given
> > > > > + * &drm_gpuvm contains mappings of.
> > > > > + *
> > > > > + * Addionally, when calling this function with struct
> > > > > drm_gpuvm_exec::extra
> > > > > + * being set the driver receives the given @fn callback to
> > > > > lock additional
> > > > > + * dma-resv in the context of the &drm_gpuvm_exec instance.
> > > > > Typically, drivers
> > > > > + * would call drm_exec_prepare_obj() from within this
> > > > > callback.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +int
> > > > > +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > > > > +                   unsigned int num_fences,
> > > > > +                   bool interruptible)
> > > > > +{
> > > > > +       struct drm_gpuvm *gpuvm = vm_exec->vm;
> > > > > +       struct drm_exec *exec = &vm_exec->exec;
> > > > > +       uint32_t flags;
> > > > > +       int ret;
> > > > > +
> > > > > +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
> > > > > 0 |
> > > > > +               DRM_EXEC_IGNORE_DUPLICATES;
> > > > > +
> > > > > +       drm_exec_init(exec, flags);
> > > > > +
> > > > > +       drm_exec_until_all_locked(exec) {
> > > > > +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
> > > > > num_fences);
> > > > > +               drm_exec_retry_on_contention(exec);
> > > > > +               if (ret)
> > > > > +                       goto err;
> > > > > +
> > > > > +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
> > > > > num_fences);
> > > > > +               drm_exec_retry_on_contention(exec);
> > > > > +               if (ret)
> > > > > +                       goto err;
> > > > > +
> > > > > +               if (vm_exec->extra.fn) {
> > > > > +                       ret = vm_exec->extra.fn(vm_exec,
> > > > > num_fences);
> > > > > +                       drm_exec_retry_on_contention(exec);
> > > > > +                       if (ret)
> > > > > +                               goto err;
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +err:
> > > > > +       drm_exec_fini(exec);
> > > > > +       return ret;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
> > > > > +
> > > > > +static int
> > > > > +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
> > > > > num_fences)
> > > > > +{
> > > > > +       struct {
> > > > > +               struct drm_gem_object **objs;
> > > > > +               unsigned int num_objs;
> > > > > +       } *args = vm_exec->extra.priv;
> > > > > +
> > > > > +       return drm_exec_prepare_array(&vm_exec->exec, args-
> > > > > >objs,
> > > > > +                                     args->num_objs,
> > > > > num_fences);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
> > > > > assoiciated BOs
> > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > + * @objs: additional &drm_gem_objects to lock
> > > > > + * @num_objs: the number of additional &drm_gem_objects to
> > > > > lock
> > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > + * @interruptible: sleep interruptible if waiting
> > > > > + *
> > > > > + * Acquires all dma-resv locks of all &drm_gem_objects the
> > > > > given &drm_gpuvm
> > > > > + * contains mappings of, plus the ones given through @objs.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +int
> > > > > +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> > > > > +                         struct drm_gem_object **objs,
> > > > > +                         unsigned int num_objs,
> > > > > +                         unsigned int num_fences,
> > > > > +                         bool interruptible)
> > > > > +{
> > > > > +       struct {
> > > > > +               struct drm_gem_object **objs;
> > > > > +               unsigned int num_objs;
> > > > > +       } args;
> > > > > +
> > > > > +       args.objs = objs;
> > > > > +       args.num_objs = num_objs;
> > > > > +
> > > > > +       vm_exec->extra.fn = fn_lock_array;
> > > > > +       vm_exec->extra.priv = &args;
> > > > > +
> > > > > +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
> > > > > interruptible);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
> > > > > within a given range
> > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > + * @addr: the start address within the VA space
> > > > > + * @range: the range to iterate within the VA space
> > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > + * @interruptible: sleep interruptible if waiting
> > > > > + *
> > > > > + * Acquires all dma-resv locks of all &drm_gem_objects
> > > > > mapped between @addr and
> > > > > + * @addr + @range.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +int
> > > > > +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> > > > > +                         u64 addr, u64 range,
> > > > > +                         unsigned int num_fences,
> > > > > +                         bool interruptible)
> > > > > +{
> > > > > +       struct drm_gpuvm *gpuvm = vm_exec->vm;
> > > > > +       struct drm_exec *exec = &vm_exec->exec;
> > > > > +       uint32_t flags;
> > > > > +       int ret;
> > > > > +
> > > > > +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
> > > > > 0 |
> > > > > +               DRM_EXEC_IGNORE_DUPLICATES;
> > > > > +
> > > > > +       drm_exec_init(exec, flags);
> > > > > +
> > > > > +       drm_exec_until_all_locked(exec) {
> > > > > +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
> > > > > addr, range,
> > > > > +                                             num_fences);
> > > > > +               drm_exec_retry_on_contention(exec);
> > > > > +               if (ret)
> > > > > +                       goto err;
> > > > > +       }
> > > > > +
> > > > > +       return ret;
> > > > > +
> > > > > +err:
> > > > > +       drm_exec_fini(exec);
> > > > > +       return ret;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_validate() - validate all BOs marked as evicted
> > > > > + * @gpuvm: the &drm_gpuvm to validate evicted BOs
> > > > > + *
> > > > > + * Calls the &drm_gpuvm_ops.bo_validate callback for all
> > > > > evicted buffer
> > > > > + * objects being mapped in the given &drm_gpuvm.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +int
> > > > > +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
> > > > > +{
> > > > > +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
> > > > > +       struct drm_gpuvm_bo *vm_bo;
> > > > > +       LIST_HEAD(evict);
> > > > > +       int ret = 0;
> > > > > +
> > > > > +       if (unlikely(!ops || !ops->bo_validate))
> > > > > +               return -ENOTSUPP;
> > > > > +
> > > > > +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
> > > > > +               dma_resv_assert_held(vm_bo->obj->resv);
> > > > > +               ret = ops->bo_validate(vm_bo->obj);
> > > > > +               if (ret)
> > > > > +                       break;
> > > > > +       }
> > > > > +       /* Drop ref in case we break out of the loop. */
> > > > > +       drm_gpuvm_bo_put(vm_bo);
> > > > > +       restore_vm_bo_list(gpuvm, evict, &evict);
> > > > > +
> > > > > +       return ret;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_resv_add_fence - add fence to private and all
> > > > > extobj
> > > > > + * dma-resv
> > > > > + * @gpuvm: the &drm_gpuvm to add a fence to
> > > > > + * @exec: the &drm_exec locking context
> > > > > + * @fence: fence to add
> > > > > + * @private_usage: private dma-resv usage
> > > > > + * @extobj_usage: extobj dma-resv usage
> > > > > + */
> > > > > +void
> > > > > +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > > > > +                        struct drm_exec *exec,
> > > > > +                        struct dma_fence *fence,
> > > > > +                        enum dma_resv_usage private_usage,
> > > > > +                        enum dma_resv_usage extobj_usage)
> > > > > +{
> > > > > +       struct drm_gem_object *obj;
> > > > > +       unsigned long index;
> > > > > +
> > > > > +       drm_exec_for_each_locked_object(exec, index, obj) {
> > > > > +               dma_resv_assert_held(obj->resv);
> > > > > +               dma_resv_add_fence(obj->resv, fence,
> > > > > +                                  drm_gpuvm_is_extobj(gpuvm,
> > > > > obj) ?
> > > > > +                                  private_usage :
> > > > > extobj_usage);
> > > > > +       }
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
> > > > > +
> > > > >    /**
> > > > >     * drm_gpuvm_bo_create() - create a new instance of struct
> > > > > drm_gpuvm_bo
> > > > >     * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> > > > > @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
> > > > > *gpuvm,
> > > > >         INIT_LIST_HEAD(&vm_bo->list.gpuva);
> > > > >         INIT_LIST_HEAD(&vm_bo->list.entry.gem);
> > > > > +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
> > > > > +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
> > > > > +
> > > > >         drm_gem_object_get(obj);
> > > > >         return vm_bo;
> > > > > @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> > > > >         drm_gem_gpuva_assert_lock_held(vm_bo->obj);
> > > > > +       spin_lock(&gpuvm->extobj.lock);
> > > > > +       list_del(&vm_bo->list.entry.extobj);
> > > > > +       spin_unlock(&gpuvm->extobj.lock);
> > > > > +
> > > > > +       spin_lock(&gpuvm->evict.lock);
> > > > > +       list_del(&vm_bo->list.entry.evict);
> > > > > +       spin_unlock(&gpuvm->evict.lock);
> > > > > +
> > > > >         list_del(&vm_bo->list.entry.gem);
> > > > >         drm_gem_object_put(obj);
> > > > > @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> > > > >     * @vm_bo: the &drm_gpuvm_bo to release the reference of
> > > > >     *
> > > > >     * This releases a reference to @vm_bo.
> > > > > + *
> > > > > + * If the reference count drops to zero, the &gpuvm_bo is
> > > > > destroyed, which
> > > > > + * includes removing it from the GEMs gpuva list. Hence, if
> > > > > a call to this
> > > > > + * function can potentially let the reference count to zero
> > > > > the caller must
> > > > > + * hold the dma-resv or driver specific GEM gpuva lock.
> > > > >     */
> > > > >    void
> > > > >    drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> > > > > @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
> > > > > *vm_bo)
> > > > >    }
> > > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
> > > > > +static int __must_check
> > > > > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
> > > > > +{
> > > > > +       return kref_get_unless_zero(&vm_bo->kref);
> > > > > +}
> > > > > +
> > > > >    static struct drm_gpuvm_bo *
> > > > >    __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> > > > >                     struct drm_gem_object *obj)
> > > > > @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
> > > > > drm_gpuvm_bo *__vm_bo)
> > > > >    }
> > > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
> > > > > +/**
> > > > > + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
> > > > > &drm_gpuvm's
> > > > > + * extobj list
> > > > > + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
> > > > > extobj list.
> > > > > + *
> > > > > + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
> > > > > not on the list
> > > > > + * already and if the corresponding &drm_gem_object is an
> > > > > external object,
> > > > > + * actually.
> > > > > + */
> > > > > +void
> > > > > +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
> > > > > +{
> > > > > +       struct drm_gpuvm *gpuvm = vm_bo->vm;
> > > > > +
> > > > > +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
> > > > > +               drm_gpuvm_bo_list_add(vm_bo, extobj);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
> > > > > / from a
> > > > > + * &drm_gpuvms evicted list
> > > > > + * @obj: the &drm_gem_object to add or remove
> > > > > + * @evict: indicates whether the object is evicted
> > > > > + *
> > > > > + * Adds a &drm_gem_object to or removes it from all
> > > > > &drm_gpuvms evicted
> > > > > + * list containing a mapping of this &drm_gem_object.
> > > > > + */
> > > > > +void
> > > > > +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
> > > > > +{
> > > > > +       struct drm_gpuvm_bo *vm_bo;
> > > > > +
> > > > > +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> > > > > +               if (evict)
> > > > > +                       drm_gpuvm_bo_list_add(vm_bo, evict);
> > > > > +               else
> > > > > +                       drm_gpuvm_bo_list_del(vm_bo, evict);
> > > > > +       }
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
> > > > > +
> > > > >    static int
> > > > >    __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
> > > > >                    struct drm_gpuva *va)
> > > > > diff --git a/include/drm/drm_gpuvm.h
> > > > > b/include/drm/drm_gpuvm.h
> > > > > index afa50b9059a2..834bb6d6617e 100644
> > > > > --- a/include/drm/drm_gpuvm.h
> > > > > +++ b/include/drm/drm_gpuvm.h
> > > > > @@ -26,10 +26,12 @@
> > > > >     */
> > > > >    #include <linux/list.h>
> > > > > +#include <linux/dma-resv.h>
> > > > >    #include <linux/rbtree.h>
> > > > >    #include <linux/types.h>
> > > > >    #include <drm/drm_gem.h>
> > > > > +#include <drm/drm_exec.h>
> > > > >    struct drm_gpuvm;
> > > > >    struct drm_gpuvm_bo;
> > > > > @@ -259,6 +261,38 @@ struct drm_gpuvm {
> > > > >          * space
> > > > >          */
> > > > >         struct dma_resv *resv;
> > > > > +
> > > > > +       /**
> > > > > +        * @extobj: structure holding the extobj list
> > > > > +        */
> > > > > +       struct {
> > > > > +               /**
> > > > > +                * @list: &list_head storing &drm_gpuvm_bos
> > > > > serving as
> > > > > +                * external object
> > > > > +                */
> > > > > +               struct list_head list;
> > > > > +
> > > > > +               /**
> > > > > +                * @lock: spinlock to protect the extobj list
> > > > > +                */
> > > > > +               spinlock_t lock;
> > > > > +       } extobj;
> > > > > +
> > > > > +       /**
> > > > > +        * @evict: structure holding the evict list and evict
> > > > > list lock
> > > > > +        */
> > > > > +       struct {
> > > > > +               /**
> > > > > +                * @list: &list_head storing &drm_gpuvm_bos
> > > > > currently being
> > > > > +                * evicted
> > > > > +                */
> > > > > +               struct list_head list;
> > > > > +
> > > > > +               /**
> > > > > +                * @lock: spinlock to protect the evict list
> > > > > +                */
> > > > > +               spinlock_t lock;
> > > > > +       } evict;
> > > > >    };
> > > > >    void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
> > > > > drm_device *drm,
> > > > > @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
> > > > > *gpuvm, struct drm_device *drm,
> > > > >                     const struct drm_gpuvm_ops *ops);
> > > > >    void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
> > > > > +/**
> > > > > + * drm_gpuvm_is_extobj() - indicates whether the given
> > > > > &drm_gem_object is an
> > > > > + * external object
> > > > > + * @gpuvm: the &drm_gpuvm to check
> > > > > + * @obj: the &drm_gem_object to check
> > > > > + *
> > > > > + * Returns: true if the &drm_gem_object &dma_resv differs
> > > > > from the
> > > > > + * &drm_gpuvms &dma_resv, false otherwise
> > > > > + */
> > > > > +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
> > > > > *gpuvm,
> > > > > +                                      struct drm_gem_object
> > > > > *obj)
> > > > > +{
> > > > > +       return obj && obj->resv != gpuvm->resv;
> > > > > +}
> > > > > +
> > > > >    static inline struct drm_gpuva *
> > > > >    __drm_gpuva_next(struct drm_gpuva *va)
> > > > >    {
> > > > > @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
> > > > >    #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
> > > > > \
> > > > >         list_for_each_entry_safe(va__, next__, &(gpuvm__)-
> > > > > >rb.list, rb.entry)
> > > > > +/**
> > > > > + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
> > > > > &drm_exec
> > > > > + *
> > > > > + * This structure should be created on the stack as
> > > > > &drm_exec should be.
> > > > > + *
> > > > > + * Optionally, @extra can be set in order to lock additional
> > > > > &drm_gem_objects.
> > > > > + */
> > > > > +struct drm_gpuvm_exec {
> > > > > +       /**
> > > > > +        * @exec: the &drm_exec structure
> > > > > +        */
> > > > > +       struct drm_exec exec;
> > > > > +
> > > > > +       /**
> > > > > +        * @vm: the &drm_gpuvm to lock its DMA reservations
> > > > > +        */
> > > > > +       struct drm_gpuvm *vm;
> > > > > +
> > > > > +       /**
> > > > > +        * @extra: Callback and corresponding private data
> > > > > for the driver to
> > > > > +        * lock arbitrary additional &drm_gem_objects.
> > > > > +        */
> > > > > +       struct {
> > > > > +               /**
> > > > > +                * @fn: The driver callback to lock
> > > > > additional &drm_gem_objects.
> > > > > +                */
> > > > > +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
> > > > > +                         unsigned int num_fences);
> > > > > +
> > > > > +               /**
> > > > > +                * @priv: driver private data for the @fn
> > > > > callback
> > > > > +                */
> > > > > +               void *priv;
> > > > > +       } extra;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
> > > > > resv
> > > > > + * @gpuvm: the &drm_gpuvm
> > > > > + * @exec: the &drm_exec context
> > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > + *
> > > > > + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
> > > > > &drm_gem_object.
> > > > > + *
> > > > > + * Using this function directly, it is the drivers
> > > > > responsibility to call
> > > > > + * drm_exec_init() and drm_exec_fini() accordingly.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +static inline int
> > > > > +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
> > > > > +                    struct drm_exec *exec,
> > > > > +                    unsigned int num_fences)
> > > > > +{
> > > > > +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
> > > > > num_fences);
> > > > > +}
> > > > > +
> > > > > +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > > > > +                             struct drm_exec *exec,
> > > > > +                             unsigned int num_fences);
> > > > > +
> > > > > +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
> > > > > +                           struct drm_exec *exec,
> > > > > +                           u64 addr, u64 range,
> > > > > +                           unsigned int num_fences);
> > > > > +
> > > > > +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > > > > +                       unsigned int num_fences,
> > > > > +                       bool interruptible);
> > > > > +
> > > > > +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
> > > > > *vm_exec,
> > > > > +                             struct drm_gem_object **objs,
> > > > > +                             unsigned int num_objs,
> > > > > +                             unsigned int num_fences,
> > > > > +                             bool interruptible);
> > > > > +
> > > > > +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
> > > > > *vm_exec,
> > > > > +                             u64 addr, u64 range,
> > > > > +                             unsigned int num_fences,
> > > > > +                             bool interruptible);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
> > > > > BOs
> > > > > + * @gpuvm: the &drm_gpuvm
> > > > > + *
> > > > > + * Releases all dma-resv locks of all &drm_gem_objects
> > > > > previously acquired
> > > > > + * through drm_gpuvm_lock() or its variants.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +static inline void
> > > > > +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> > > > > +{
> > > > > +       drm_exec_fini(&vm_exec->exec);
> > > > > +}
> > > > > +
> > > > > +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
> > > > > +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > > > > +                             struct drm_exec *exec,
> > > > > +                             struct dma_fence *fence,
> > > > > +                             enum dma_resv_usage
> > > > > private_usage,
> > > > > +                             enum dma_resv_usage
> > > > > extobj_usage);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_exec_resv_add_fence()
> > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > + * @fence: fence to add
> > > > > + * @private_usage: private dma-resv usage
> > > > > + * @extobj_usage: extobj dma-resv usage
> > > > > + *
> > > > > + * See drm_gpuvm_resv_add_fence().
> > > > > + */
> > > > > +static inline void
> > > > > +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
> > > > > *vm_exec,
> > > > > +                             struct dma_fence *fence,
> > > > > +                             enum dma_resv_usage
> > > > > private_usage,
> > > > > +                             enum dma_resv_usage
> > > > > extobj_usage)
> > > > > +{
> > > > > +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
> > > > > fence,
> > > > > +                                private_usage,
> > > > > extobj_usage);
> > > > > +}
> > > > > +
> > > > >    /**
> > > > >     * struct drm_gpuvm_bo - structure representing a
> > > > > &drm_gpuvm and
> > > > >     * &drm_gem_object combination
> > > > > @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
> > > > >                          * gpuva list.
> > > > >                          */
> > > > >                         struct list_head gem;
> > > > > +
> > > > > +                       /**
> > > > > +                        * @evict: List entry to attach to
> > > > > the &drm_gpuvms
> > > > > +                        * extobj list.
> > > > > +                        */
> > > > > +                       struct list_head extobj;
> > > > > +
> > > > > +                       /**
> > > > > +                        * @evict: List entry to attach to
> > > > > the &drm_gpuvms evict
> > > > > +                        * list.
> > > > > +                        */
> > > > > +                       struct list_head evict;
> > > > >                 } entry;
> > > > >         } list;
> > > > >    };
> > > > > @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
> > > > >    drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> > > > >                   struct drm_gem_object *obj);
> > > > > +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
> > > > > evict);
> > > > > +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
> > > > > +
> > > > >    /**
> > > > >     * drm_gpuvm_bo_for_each_va() - iterator to walk over a
> > > > > list of &drm_gpuva
> > > > >     * @va__: &drm_gpuva structure to assign to in each
> > > > > iteration step
> > > > > @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
> > > > >          * used.
> > > > >          */
> > > > >         int (*sm_step_unmap)(struct drm_gpuva_op *op, void
> > > > > *priv);
> > > > > +
> > > > > +       /**
> > > > > +        * @bo_validate: called from drm_gpuvm_validate()
> > > > > +        *
> > > > > +        * Drivers receive this callback for every evicted
> > > > > &drm_gem_object being
> > > > > +        * mapped in the corresponding &drm_gpuvm.
> > > > > +        *
> > > > > +        * Typically, drivers would call their driver
> > > > > specific variant of
> > > > > +        * ttm_bo_validate() from within this callback.
> > > > > +        */
> > > > > +       int (*bo_validate)(struct drm_gem_object *obj);
> > > > >    };
> > > > >    int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
> > 
>
Thomas Hellstrom Sept. 13, 2023, 10:39 a.m. UTC | #15
Hi,

On 9/13/23 09:19, Boris Brezillon wrote:
> On Wed, 13 Sep 2023 17:05:42 +1000
> Dave Airlie <airlied@gmail.com> wrote:
>
>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
>> <boris.brezillon@collabora.com> wrote:
>>> On Tue, 12 Sep 2023 18:20:32 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>   
>>>>> +/**
>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>> + * @__gpuvm: The GPU VM
>>>>> + * @__list_name: The name of the list we're iterating on
>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
>>>>> + *
>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
>>>>> + * iterator releases the lock immediately after picking the first element from
>>>>> + * the list, so list insertion deletion can happen concurrently.
>>>> Are the list spinlocks needed for that async state update from within
>>>> the dma-fence critical section we've discussed previously?
>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
>>> get that Xe and Nouveau don't need that because they update the VM
>>> state early (in the ioctl path), but I keep thinking this will hurt us
>>> if we don't think it through from the beginning, because once you've
>>> set this logic to depend only on resv locks, it will be pretty hard to
>>> get back to a solution which lets synchronous VM_BINDs take precedence
>>> on asynchronous request, and, with vkQueueBindSparse() passing external
>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
>>> take a long time to get your synchronous VM_BIND executed...

So this would boil down to either (possibly opt-in) keeping the spinlock 
approach or pushing the unlink out to a wq then?
BTW, as also asked in a reply to Danilo, how do you call unlink from 
run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?

>>>   
>> btw what is the use case for this? do we have actual vulkan
>> applications we know will have problems here?
> I don't, but I think that's a concern Faith raised at some point (dates
> back from when I was reading threads describing how VM_BIND on i915
> should work, and I was clearly discovering this whole VM_BIND thing at
> that time, so maybe I misunderstood).
>
>> it feels like a bit of premature optimisation, but maybe we have use cases.
> Might be, but that's the sort of thing that would put us in a corner if
> we don't have a plan for when the needs arise. Besides, if we don't
> want to support that case because it's too complicated, I'd recommend
> dropping all the drm_gpuvm APIs that let people think this mode is
> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
> confusion.

Xe allows bypassing the bind-queue with another bind-queue, but to 
completely avoid dependencies between queues the Operations may not 
overlap.  (And the definition of overlap is currently page-table 
structure updates may not overlap) but no guarantees are made about 
priority.

/Thomas
Boris Brezillon Sept. 13, 2023, 11:33 a.m. UTC | #16
On Wed, 13 Sep 2023 12:39:01 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> Hi,
> 
> On 9/13/23 09:19, Boris Brezillon wrote:
> > On Wed, 13 Sep 2023 17:05:42 +1000
> > Dave Airlie <airlied@gmail.com> wrote:
> >  
> >> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
> >> <boris.brezillon@collabora.com> wrote:  
> >>> On Tue, 12 Sep 2023 18:20:32 +0200
> >>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>     
> >>>>> +/**
> >>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
> >>>>> + * @__gpuvm: The GPU VM
> >>>>> + * @__list_name: The name of the list we're iterating on
> >>>>> + * @__local_list: A pointer to the local list used to store already iterated items
> >>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> >>>>> + *
> >>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
> >>>>> + * iterator releases the lock immediately after picking the first element from
> >>>>> + * the list, so list insertion deletion can happen concurrently.  
> >>>> Are the list spinlocks needed for that async state update from within
> >>>> the dma-fence critical section we've discussed previously?  
> >>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> >>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
> >>> get that Xe and Nouveau don't need that because they update the VM
> >>> state early (in the ioctl path), but I keep thinking this will hurt us
> >>> if we don't think it through from the beginning, because once you've
> >>> set this logic to depend only on resv locks, it will be pretty hard to
> >>> get back to a solution which lets synchronous VM_BINDs take precedence
> >>> on asynchronous request, and, with vkQueueBindSparse() passing external
> >>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
> >>> take a long time to get your synchronous VM_BIND executed...  
> 
> So this would boil down to either (possibly opt-in) keeping the spinlock 
> approach or pushing the unlink out to a wq then?

Deferred _unlink() would not be an issue, since I already defer the
drm_gpuva destruction to a wq, it would just a be a matter of moving the
_unlink() call there as well. But _link() also takes the GEM gpuva list
lock, and that one is bit tricky, in that sm_map() can trigger 2 more
_link() calls for the prev/next mappings, which we can't guess until we
get to execute the VM update. If we mandate the use of the GEM resv
lock, that simply means async VM updates (AKA calling
drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
agrees on, then I'd like the APIs that make this sort of async VM
update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
methods, and probably other things) to be dropped, so we don't make it
look like it's something we support.

> BTW, as also asked in a reply to Danilo, how do you call unlink from 
> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?

_unlink() makes sure the GEM gpuva list lock is taken, but this can be
a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
protection. We make sure we never take this lock while allocating
memory to guarantee the dma-signalling path can't deadlock.

> 
> >>>     
> >> btw what is the use case for this? do we have actual vulkan
> >> applications we know will have problems here?  
> > I don't, but I think that's a concern Faith raised at some point (dates
> > back from when I was reading threads describing how VM_BIND on i915
> > should work, and I was clearly discovering this whole VM_BIND thing at
> > that time, so maybe I misunderstood).
> >  
> >> it feels like a bit of premature optimisation, but maybe we have use cases.  
> > Might be, but that's the sort of thing that would put us in a corner if
> > we don't have a plan for when the needs arise. Besides, if we don't
> > want to support that case because it's too complicated, I'd recommend
> > dropping all the drm_gpuvm APIs that let people think this mode is
> > valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
> > drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
> > confusion.  
> 
> Xe allows bypassing the bind-queue with another bind-queue, but to 
> completely avoid dependencies between queues the Operations may not 
> overlap.

So, you check the VM state with some VM lock held (would be the VM resv
in my case), and if the mapping is new (no overlaps with pre-existing
mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
be missing I guess is a way to know if the mapping is active (MMU has
been updated) or pending (MMU update queued to the bind-queue), so I can
fast-track mapping/unmapping of active mappings. This would leave
overlapping sync/async VM updates, which can't happen in practice
unless userspace is doing something wrong (sparse bindings always go
through vkQueueBindSparse).

I'll give it a try.

> (And the definition of overlap is currently page-table 
> structure updates may not overlap) but no guarantees are made about 
> priority.
> 
> /Thomas
> 
> 
>
Danilo Krummrich Sept. 13, 2023, 12:01 p.m. UTC | #17
After some more discussion with Boris on IRC, he seems to be willing to drop GPUVM
updates from the async path. If everyone agrees I'm fine to go ahead and drop this
use case for GPUVM.

@Thomas: I will reply to your last mail only considering GPUVM updates from within
the IOCTL.

- Danilo

On 9/13/23 13:33, Boris Brezillon wrote:
> On Wed, 13 Sep 2023 12:39:01 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> 
>> Hi,
>>
>> On 9/13/23 09:19, Boris Brezillon wrote:
>>> On Wed, 13 Sep 2023 17:05:42 +1000
>>> Dave Airlie <airlied@gmail.com> wrote:
>>>   
>>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
>>>> <boris.brezillon@collabora.com> wrote:
>>>>> On Tue, 12 Sep 2023 18:20:32 +0200
>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>      
>>>>>>> +/**
>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
>>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
>>>>>>> + *
>>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
>>>>>>> + * iterator releases the lock immediately after picking the first element from
>>>>>>> + * the list, so list insertion deletion can happen concurrently.
>>>>>> Are the list spinlocks needed for that async state update from within
>>>>>> the dma-fence critical section we've discussed previously?
>>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
>>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
>>>>> get that Xe and Nouveau don't need that because they update the VM
>>>>> state early (in the ioctl path), but I keep thinking this will hurt us
>>>>> if we don't think it through from the beginning, because once you've
>>>>> set this logic to depend only on resv locks, it will be pretty hard to
>>>>> get back to a solution which lets synchronous VM_BINDs take precedence
>>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
>>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
>>>>> take a long time to get your synchronous VM_BIND executed...
>>
>> So this would boil down to either (possibly opt-in) keeping the spinlock
>> approach or pushing the unlink out to a wq then?
> 
> Deferred _unlink() would not be an issue, since I already defer the
> drm_gpuva destruction to a wq, it would just a be a matter of moving the
> _unlink() call there as well. But _link() also takes the GEM gpuva list
> lock, and that one is bit tricky, in that sm_map() can trigger 2 more
> _link() calls for the prev/next mappings, which we can't guess until we
> get to execute the VM update. If we mandate the use of the GEM resv
> lock, that simply means async VM updates (AKA calling
> drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
> agrees on, then I'd like the APIs that make this sort of async VM
> update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
> methods, and probably other things) to be dropped, so we don't make it
> look like it's something we support.
> 
>> BTW, as also asked in a reply to Danilo, how do you call unlink from
>> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?
> 
> _unlink() makes sure the GEM gpuva list lock is taken, but this can be
> a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
> panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
> protection. We make sure we never take this lock while allocating
> memory to guarantee the dma-signalling path can't deadlock.
> 
>>
>>>>>      
>>>> btw what is the use case for this? do we have actual vulkan
>>>> applications we know will have problems here?
>>> I don't, but I think that's a concern Faith raised at some point (dates
>>> back from when I was reading threads describing how VM_BIND on i915
>>> should work, and I was clearly discovering this whole VM_BIND thing at
>>> that time, so maybe I misunderstood).
>>>   
>>>> it feels like a bit of premature optimisation, but maybe we have use cases.
>>> Might be, but that's the sort of thing that would put us in a corner if
>>> we don't have a plan for when the needs arise. Besides, if we don't
>>> want to support that case because it's too complicated, I'd recommend
>>> dropping all the drm_gpuvm APIs that let people think this mode is
>>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
>>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
>>> confusion.
>>
>> Xe allows bypassing the bind-queue with another bind-queue, but to
>> completely avoid dependencies between queues the Operations may not
>> overlap.
> 
> So, you check the VM state with some VM lock held (would be the VM resv
> in my case), and if the mapping is new (no overlaps with pre-existing
> mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
> be missing I guess is a way to know if the mapping is active (MMU has
> been updated) or pending (MMU update queued to the bind-queue), so I can
> fast-track mapping/unmapping of active mappings. This would leave
> overlapping sync/async VM updates, which can't happen in practice
> unless userspace is doing something wrong (sparse bindings always go
> through vkQueueBindSparse).
> 
> I'll give it a try.
> 
>> (And the definition of overlap is currently page-table
>> structure updates may not overlap) but no guarantees are made about
>> priority.
>>
>> /Thomas
>>
>>
>>
>
Danilo Krummrich Sept. 13, 2023, 12:16 p.m. UTC | #18
As mentioned in a different mail thread, the reply is based on the assumption
that we don't support anything else than GPUVM updates from the IOCTL.

On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
> Hi!
> 
> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
> > On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
> > > 
> > > On 9/12/23 18:50, Danilo Krummrich wrote:
> > > > On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
> > > > > Hi, Danilo,
> > > > > 
> > > > > On 9/9/23 17:31, Danilo Krummrich wrote:
> > > > > > So far the DRM GPUVA manager offers common infrastructure to
> > > > > > track GPU VA
> > > > > > allocations and mappings, generically connect GPU VA mappings
> > > > > > to their
> > > > > > backing buffers and perform more complex mapping operations
> > > > > > on the GPU VA
> > > > > > space.
> > > > > > 
> > > > > > However, there are more design patterns commonly used by
> > > > > > drivers, which
> > > > > > can potentially be generalized in order to make the DRM GPUVA
> > > > > > manager
> > > > > > represent a basic GPU-VM implementation. In this context,
> > > > > > this patch aims
> > > > > > at generalizing the following elements.
> > > > > > 
> > > > > > 1) Provide a common dma-resv for GEM objects not being used
> > > > > > outside of
> > > > > >      this GPU-VM.
> > > > > > 
> > > > > > 2) Provide tracking of external GEM objects (GEM objects
> > > > > > which are
> > > > > >      shared with other GPU-VMs).
> > > > > > 
> > > > > > 3) Provide functions to efficiently lock all GEM objects dma-
> > > > > > resv the
> > > > > >      GPU-VM contains mappings of.
> > > > > > 
> > > > > > 4) Provide tracking of evicted GEM objects the GPU-VM
> > > > > > contains mappings
> > > > > >      of, such that validation of evicted GEM objects is
> > > > > > accelerated.
> > > > > > 
> > > > > > 5) Provide some convinience functions for common patterns.
> > > > > > 
> > > > > > Rather than being designed as a "framework", the target is to
> > > > > > make all
> > > > > > features appear as a collection of optional helper functions,
> > > > > > such that
> > > > > > drivers are free to make use of the DRM GPUVA managers basic
> > > > > > functionality and opt-in for other features without setting
> > > > > > any feature
> > > > > > flags, just by making use of the corresponding functions.
> > > > > > 
> > > > > > Big kudos to Boris Brezillon for his help to figure out
> > > > > > locking for drivers
> > > > > > updating the GPU VA space within the fence signalling path.
> > > > > > 
> > > > > > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > > > > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > > > > ---
> > > > > >    drivers/gpu/drm/drm_gpuvm.c | 516
> > > > > > ++++++++++++++++++++++++++++++++++++
> > > > > >    include/drm/drm_gpuvm.h     | 197 ++++++++++++++
> > > > > >    2 files changed, 713 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c
> > > > > > b/drivers/gpu/drm/drm_gpuvm.c
> > > > > > index f4411047dbb3..8e62a043f719 100644
> > > > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > > > @@ -73,6 +73,21 @@
> > > > > >     * &drm_gem_object list of &drm_gpuvm_bos for an existing
> > > > > > instance of this
> > > > > >     * particular combination. If not existent a new instance
> > > > > > is created and linked
> > > > > >     * to the &drm_gem_object.
> > > > > > + *
> > > > > > + * &drm_gpuvm_bo structures, since unique for a given
> > > > > > &drm_gpuvm, are also used
> > > > > > + * as entry for the &drm_gpuvm's lists of external and
> > > > > > evicted objects. Those
> > > > > > + * list are maintained in order to accelerate locking of
> > > > > > dma-resv locks and
> > > > > > + * validation of evicted objects bound in a &drm_gpuvm. For
> > > > > > instance the all
> > > > > > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
> > > > > > locked by calling
> > > > > > + * drm_gpuvm_exec_lock(). Once locked drivers can call
> > > > > > drm_gpuvm_validate() in
> > > > > > + * order to validate all evicted &drm_gem_objects. It is
> > > > > > also possible to lock
> > > > > > + * additional &drm_gem_objects by providing the
> > > > > > corresponding parameters to
> > > > > > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
> > > > > > loop while making
> > > > > > + * use of helper functions such as drm_gpuvm_prepare_range()
> > > > > > or
> > > > > > + * drm_gpuvm_prepare_objects().
> > > > > > + *
> > > > > > + * Every bound &drm_gem_object is treated as external object
> > > > > > when its &dma_resv
> > > > > > + * structure is different than the &drm_gpuvm's common
> > > > > > &dma_resv structure.
> > > > > >     */
> > > > > >    /**
> > > > > > @@ -420,6 +435,20 @@
> > > > > >     * Subsequent calls to drm_gpuvm_bo_obtain() for the same
> > > > > > &drm_gpuvm and
> > > > > >     * &drm_gem_object must be able to observe previous
> > > > > > creations and destructions
> > > > > >     * of &drm_gpuvm_bos in order to keep instances unique.
> > > > > > + *
> > > > > > + * The &drm_gpuvm's lists for keeping track of external and
> > > > > > evicted objects are
> > > > > > + * protected against concurrent insertion / removal and
> > > > > > iteration internally.
> > > > > > + *
> > > > > > + * However, drivers still need ensure to protect concurrent
> > > > > > calls to functions
> > > > > > + * iterating those lists, such as drm_gpuvm_validate() and
> > > > > > + * drm_gpuvm_prepare_objects(). Every such function contains
> > > > > > a particular
> > > > > > + * comment and lockdep checks if possible.
> > > > > > + *
> > > > > > + * Functions adding or removing entries from those lists,
> > > > > > such as
> > > > > > + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
> > > > > > called with external
> > > > > > + * locks being held, e.g. in order to avoid the
> > > > > > corresponding list to be
> > > > > > + * (safely) modified while potentially being iternated by
> > > > > > other API functions.
> > > > > > + * However, this is entirely optional.
> > > > > >     */
> > > > > >    /**
> > > > > > @@ -632,6 +661,131 @@
> > > > > >     *   }
> > > > > >     */
> > > > > > +/**
> > > > > > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > > > > > + * @__gpuvm: The GPU VM
> > > > > > + * @__list_name: The name of the list we're iterating on
> > > > > > + * @__local_list: A pointer to the local list used to store
> > > > > > already iterated items
> > > > > > + * @__prev_vm_bo: The previous element we got from
> > > > > > drm_gpuvm_get_next_cached_vm_bo()
> > > > > > + *
> > > > > > + * This helper is here to provide lockless list iteration.
> > > > > > Lockless as in, the
> > > > > > + * iterator releases the lock immediately after picking the
> > > > > > first element from
> > > > > > + * the list, so list insertion deletion can happen
> > > > > > concurrently.
> > > > > Are the list spinlocks needed for that async state update from
> > > > > within the
> > > > > dma-fence critical section we've discussed previously?
> > > > Yes, but also for other reasons, see below.
> > > > 
> > > > > Otherwise it should be sufficient to protect the lists with the
> > > > > gpuvm's resv
> > > > > (or for the extobj list with an outer lock).
> > > > > 
> > > > > If those spinlocks are still needed in some situations, perhaps
> > > > > could we
> > > > > have an option to set them to NULL (Like IIRC the maple tree
> > > > > allows for)?
> > > > The evict spinlock is needed in any case, since in
> > > > drm_gpuvm_bo_evict() we're
> > > > holding only the dma-resv lock from the BO this function gets
> > > > called for. Hence,
> > > > the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
> > > > different BOs.
> > > No. Only if you try to add external objects to the vm's evict list
> > > from
> > > within the evict code. That's not necessary since you loop through
> > > all
> > > external objects anyway when locking them so an "evicted" bool in
> > > the vm_bo,
> > > protected by the bo resv would be sufficient. The extobj locking
> > > loop can
> > > then add the bo to the evicted list.
> > 
> > And validate() can remove it while still holding all dma-resv locks,
> > neat!
> > However, what if two tasks are trying to lock the VA space
> > concurrently? What
> > do we do when the drm_gpuvm_bo's refcount drops to zero in
> > drm_gpuva_unlink()?
> > Are we guaranteed that at this point of time the drm_gpuvm_bo is not
> > on the
> > evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
> > with the
> > dma-resv lock held, which wouldn't be allowed, since
> > drm_gpuvm_bo_destroy()
> > might drop the last reference to the drm_gem_object and hence we'd
> > potentially
> > free the dma-resv lock while holding it, at least if it's an external
> > object.
> 
> Easiest way in this scheme is to think of the lists as being protected
> by the vm's resv lock. That means anybody calling unlink() must also
> hold the vm's resv lock. (Which is OK from an UAF point of view, but
> perhaps not from a locking inversion POW from an async list update).

This would mean that on unlink() we'd need to hold the VM's resv lock and the
corresponding GEM's resv lock (in case they're not the same anyways) because the
VM's resv lock would protect the external / evicted object lists and the GEM
objects resv lock protects the GEM's list of drm_gpuvm_bos and the
drm_gpuvm_bo's list of drm_gpuvas.

> 
> > 
> > > > 
> > > > For extobjs an outer lock would be enough in case of Xe, but I
> > > > really would not
> > > > like to add even more complexity just to get the spinlock out of
> > > > the way in case
> > > > the driver already has an outer lock protecting this path.
> > > 
> > > I must disagree here. These spinlocks and atomic operations are
> > > pretty
> > > costly and as discussed earlier this type of locking was the reason
> > > (at
> > > least according to the commit message) that made Christian drop the
> > > XArray
> > > use in drm_exec for the same set of objects: "The locking overhead
> > > is
> > > unecessary and measurable". IMHO the spinlock is the added
> > > complexity and a
> > > single wide lock following the drm locking guidelines set out by
> > > Daniel and
> > > David should really be the default choice with an opt-in for a
> > > spinlock if
> > > needed for async and pushing out to a wq is not an option.
> > 
> > For the external object list an outer lock would work as long as it's
> > not the
> > dma-resv lock of the corresponding GEM object, since here we actually
> > need to
> > remove the list entry from the external object list on
> > drm_gpuvm_bo_destroy().
> > It's just a bit weird design wise that drivers would need to take
> > this outer
> > lock on:
> > 
> > - drm_gpuvm_bo_extobj_add()
> > - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
> > - drm_gpuva_unlink()            (because it needs to call
> > drm_gpuvm_bo_put())
> > - drm_gpuvm_exec_lock()
> > - drm_gpuvm_exec_lock_array()
> > - drm_gpuvm_prepare_range()
> > 
> > Given that it seems reasonable to do all the required locking
> > internally.
> 
> From a design POW, there has been a clear direction in XE to make
> things similar to mmap() / munmap(), so this outer lock, which in Xe is
> an rwsem, is used in a similar way as the mmap_lock. It's protecting
> the page-table structures and vma rb tree, the userptr structures and
> the extobj list. Basically it's taken early in the exec IOCTL, the
> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
> all of the above are just asserting that it is taken in the correct
> mode.
> 
> But strictly with this scheme one could also use the vm's dma_resv for
> the extobj list since with drm_exec, it's locked before traversing the
> list.
> 
> The whole point of this scheme is to rely on locks that you already are
> supposed to be holding for various reasons and is simple to comprehend.

I don't agree that we're supposed to hold the VM's resv lock anyways for
functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine using it
for that purpose nevertheless.

> 
> > 
> > In order to at least place lockdep checks, the driver would need to
> > supply the
> > corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
> > know about
> > the lock.
> 
> Yes, that sounds reasonable. One lockdep map per list.

I'd really like to avoid that, especially now that everything got simpler. We
should define the actual locks to take instead.

> 
> > 
> > Out of curiosity, what is the overhead of a spin_lock() that doesn't
> > need to
> > spin? 
> 
> I guess it's hard to tell exactly, but it is much lower on modern x86
> than what it used to be. Not sure about ARM, which is the other
> architecture important to us. I figure if there is little cache-line
> bouncing the main overhead comes from the implied barriers.
> 
> > 
> > > 
> > > A pretty simple way that would not add much code would be
> > > 
> > > static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
> > > spinlock_t
> > > *lock)
> > > 
> > > {
> > > 
> > >     if (!gpuvm->resv_protected_lists)
> > >         spin_lock(lock);
> > > 
> > > }
> > > 
> > > > > For such drivers, that would require anybody calling unlink to
> > > > > hold the vm's
> > > > > resv, though.
> > > > In V4 I want to go back to having a dedicated lock for the GEMs
> > > > gpuva list (or
> > > > VM_BO list to be more precise). We can't just use the dma-resv
> > > > lock for that
> > > > with VM_BO abstractions, because on destruction of a VM_BO we
> > > > otherwise wouldn't
> > > > be allowed to already hold the dma-resv lock. That's the fix I
> > > > was referring to
> > > > earlier.
> > > 
> > > Yeah, I can see the need for a dedicated lock for the GEM's gpuva
> > > list, but
> > > holding the vm's dma-resv lock across the unlink shouldn't be a
> > > problem. We
> > > may free the object and a pointer to the vm's resv during unlink
> > > but we
> > > don't free the vm's resv.  It'd be a matter of ensuring that any
> > > calls to
> > > unlink from *within* drm_gpuvm allows it to be held.
> > 
> > Drivers calling unlink() from the fence signaling path can't use the
> > VM's
> > dma-resv lock.
> 
> Yes, that made me a bit curious because in the current version the code
> required the object's dma_resv for unlink() which can't be grabbed
> either from the fence signaling path. So are there any drivers actually
> wanting to do that? If so, they will either need to resort to the
> current spinlock solution or they will need to call unlink from a
> workqueue item.

As Boris already mentioned we have the dma-resv lock by default or a driver
specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.

> > 
> > Also, what if the object is an external object? We can't use the VM's
> > dma-resv
> > lock here.
> 
> Why? Typically (sync) unlink is only ever called from an unbind-like
> operation where it should be trivial to grab the vm's resv. Or, for
> that matter any outer lock protecting the extobj list. Rule would be
> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
> be protected by either the vm's dma_resv (or possibly an outer lock in
> the case of the extobj list).

Outer lock wouldn't have been working for updates in the async path, but
shouldn't be relevant anymore. We could use the VM's resv for that.

> 
> >  And we can't have the GEM objs dma-resv lock held when calling
> > unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
> > refcount drops
> > to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
> > drop the
> > last reference of the GEM object.
> 
> Yes, but this is a different problem as to what exactly protects
> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
> lock, or if we want to keep the bo's dma_resv we need to ensure that
> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
> Boris didn't like that, but requiring an explicit refcount for a
> pointer you dereference unless you're under a lock that ensures keeping
> the object alive is pretty much required?) But anyway for the
> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
> I don't have a strong preference.

We can keep the GEM objects dma-resv lock, however as mentioned above
drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's resv lock
and the GEM's resv lock in case they differ.

> 
> >  All those problems go away with a dedicated
> > GEM gpuva list lock.
> 
> I don't think these are real problems.
> With the excepton of the eviction list "trick" where we currently have
> slightly different approach to collect external bos needing rebinding,
> we have this working fine.
> 
> TBH I think pretty much the only situation where the spinlock is needed
> is for async updates of these lists, unless a wq item can be used for
> that, but it doesn't really seem like the current code allows for such
> updates anyway? It complicates the code a lot, adds overhead and also
> adds the requirement for refcounting during list traversal.
> 
> /Thomas
> 
> > 
> > > 
> > > /Thomas
> > > 
> > > 
> > > > > It seems that with that also the refcount could be make non-
> > > > > atomic.
> > > > > 
> > > > > All in the spirit of the drm locking guidelines "use big locks
> > > > > when
> > > > > possible".
> > > > > Lower level locks only when necessary for performance or
> > > > > locking inversion?
> > > > > 
> > > > > /Thomas
> > > > > 
> > > > > 
> > > > > > + *
> > > > > > + * Elements popped from the original list are kept in a
> > > > > > local list, so removal
> > > > > > + * and is_empty checks can still happen while we're
> > > > > > iterating the list.
> > > > > > + */
> > > > > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
> > > > > > __local_list, __prev_vm_bo)     \
> > > > > > +       ({                                                   
> > > > > >                            \
> > > > > > +               struct drm_gpuvm_bo
> > > > > > *__vm_bo;                                           \
> > > > > > +                                                            
> > > > > >                            \
> > > > > > +               drm_gpuvm_bo_put(__prev_vm_bo);              
> > > > > >                            \
> > > > > > +                                                            
> > > > > >                            \
> > > > > > +               spin_lock(&(__gpuvm)-
> > > > > > >__list_name.lock);                                \
> > > > > > +               while (!list_empty(&(__gpuvm)-
> > > > > > >__list_name.list)) {                     \
> > > > > > +                       __vm_bo =
> > > > > > list_first_entry(&(__gpuvm)->__list_name.list,        \
> > > > > > +                                                  struct
> > > > > > drm_gpuvm_bo,                 \
> > > > > > +                                                 
> > > > > > list.entry.__list_name);             \
> > > > > > +                       if
> > > > > > (drm_gpuvm_bo_get_unless_zero(__vm_bo))
> > > > > > {                    \
> > > > > > +                               list_move_tail(&(__vm_bo)-
> > > > > > >list.entry.__list_name,      \
> > > > > > +                                             
> > > > > > __local_list);                           \
> > > > > > +                               break;                       
> > > > > >                            \
> > > > > > +                       } else
> > > > > > {                                                        \
> > > > > > +                               list_del_init(&(__vm_bo)-
> > > > > > >list.entry.__list_name);      \
> > > > > > +                               __vm_bo =
> > > > > > NULL;                                         \
> > > > > > +                       }                                    
> > > > > >                            \
> > > > > > +               }                                            
> > > > > >                            \
> > > > > > +               spin_unlock(&(__gpuvm)-
> > > > > > >__list_name.lock);                              \
> > > > > > +                                                            
> > > > > >                            \
> > > > > > +               __vm_bo;                                     
> > > > > >                            \
> > > > > > +       })
> > > > > > +
> > > > > > +/**
> > > > > > + * for_each_vm_bo_in_list() - internal vm_bo list iterator
> > > > > > + *
> > > > > > + * This helper is here to provide lockless list iteration.
> > > > > > Lockless as in, the
> > > > > > + * iterator releases the lock immediately after picking the
> > > > > > first element from the
> > > > > > + * list, so list insertion and deletion can happen
> > > > > > concurrently.
> > > > > > + *
> > > > > > + * Typical use:
> > > > > > + *
> > > > > > + *     struct drm_gpuvm_bo *vm_bo;
> > > > > > + *     LIST_HEAD(my_local_list);
> > > > > > + *
> > > > > > + *     ret = 0;
> > > > > > + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
> > > > > > &my_local_list, vm_bo) {
> > > > > > + *             ret = do_something_with_vm_bo(..., vm_bo);
> > > > > > + *             if (ret)
> > > > > > + *                     break;
> > > > > > + *     }
> > > > > > + *     drm_gpuvm_bo_put(vm_bo);
> > > > > > + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
> > > > > > &my_local_list);
> > > > > > + *
> > > > > > + *
> > > > > > + * Only used for internal list iterations, not meant to be
> > > > > > exposed to the outside
> > > > > > + * world.
> > > > > > + */
> > > > > > +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
> > > > > > __local_list, __vm_bo)    \
> > > > > > +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
> > > > > > __list_name,           \
> > > > > > +                                               __local_list,
> > > > > > NULL);            \
> > > > > > +           
> > > > > > __vm_bo;                                                     
> > > > > >       \
> > > > > > +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
> > > > > > __list_name,           \
> > > > > > +                                               __local_list,
> > > > > > __vm_bo))         \
> > > > > > +
> > > > > > +/**
> > > > > > + * restore_vm_bo_list() - move vm_bo elements back to their
> > > > > > original list
> > > > > > + * @__gpuvm: The GPU VM
> > > > > > + * @__list_name: The name of the list we're iterating on
> > > > > > + * @__local_list: A pointer to the local list used to store
> > > > > > already iterated items
> > > > > > + *
> > > > > > + * When we're done iterating a vm_bo list, we should call
> > > > > > restore_vm_bo_list()
> > > > > > + * to restore the original state and let new iterations take
> > > > > > place.
> > > > > > + */
> > > > > > +#define restore_vm_bo_list(__gpuvm, __list_name,
> > > > > > __local_list)                         \
> > > > > > +       do
> > > > > > {                                                            
> > > > > >                 \
> > > > > > +               /* Merge back the two lists, moving local
> > > > > > list elements to the          \
> > > > > > +                * head to preserve previous ordering, in
> > > > > > case it matters.              \
> > > > > > +               
> > > > > > */                                                           
> > > > > >           \
> > > > > > +               spin_lock(&(__gpuvm)-
> > > > > > >__list_name.lock);                                \
> > > > > > +               list_splice(__local_list, &(__gpuvm)-
> > > > > > >__list_name.list);                \
> > > > > > +               spin_unlock(&(__gpuvm)-
> > > > > > >__list_name.lock);                              \
> > > > > > +       } while (0)
> > > > > > +/**
> > > > > > + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
> > > > > > list
> > > > > > + * @__vm_bo: the &drm_gpuvm_bo
> > > > > > + * @__list_name: the name of the list to insert into
> > > > > > + *
> > > > > > + * Inserts the given @__vm_bo into the list specified by
> > > > > > @__list_name and
> > > > > > + * increases the vm_bo's reference count.
> > > > > > + */
> > > > > > +#define drm_gpuvm_bo_list_add(__vm_bo,
> > > > > > __list_name)                            \
> > > > > > +       do
> > > > > > {                                                            
> > > > > >         \
> > > > > > +               spin_lock(&(__vm_bo)->vm-
> > > > > > >__list_name.lock);                    \
> > > > > > +               if (list_empty(&(__vm_bo)-
> > > > > > >list.entry.__list_name))             \
> > > > > > +                       list_add_tail(&(__vm_bo)-
> > > > > > >list.entry.__list_name,       \
> > > > > > +                                     &(__vm_bo)->vm-
> > > > > > >__list_name.list);        \
> > > > > > +               spin_unlock(&(__vm_bo)->vm-
> > > > > > >__list_name.lock);                  \
> > > > > > +       } while (0)
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
> > > > > > list
> > > > > > + * @__vm_bo: the &drm_gpuvm_bo
> > > > > > + * @__list_name: the name of the list to insert into
> > > > > > + *
> > > > > > + * Removes the given @__vm_bo from the list specified by
> > > > > > @__list_name and
> > > > > > + * decreases the vm_bo's reference count.
> > > > > > + */
> > > > > > +#define drm_gpuvm_bo_list_del(__vm_bo,
> > > > > > __list_name)                            \
> > > > > > +       do
> > > > > > {                                                            
> > > > > >         \
> > > > > > +               spin_lock(&(__vm_bo)->vm-
> > > > > > >__list_name.lock);                    \
> > > > > > +               if (!list_empty(&(__vm_bo)-
> > > > > > >list.entry.__list_name))            \
> > > > > > +                       list_del_init(&(__vm_bo)-
> > > > > > >list.entry.__list_name);      \
> > > > > > +               spin_unlock(&(__vm_bo)->vm-
> > > > > > >__list_name.lock);                  \
> > > > > > +       } while (0)
> > > > > > +
> > > > > > +static int __must_check
> > > > > > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
> > > > > > +
> > > > > >    #define to_drm_gpuva(__node) container_of((__node), struct
> > > > > > drm_gpuva, rb.node)
> > > > > >    #define GPUVA_START(node) ((node)->va.addr)
> > > > > > @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
> > > > > > struct drm_device *drm,
> > > > > >         gpuvm->rb.tree = RB_ROOT_CACHED;
> > > > > >         INIT_LIST_HEAD(&gpuvm->rb.list);
> > > > > > +       INIT_LIST_HEAD(&gpuvm->extobj.list);
> > > > > > +       spin_lock_init(&gpuvm->extobj.lock);
> > > > > > +
> > > > > > +       INIT_LIST_HEAD(&gpuvm->evict.list);
> > > > > > +       spin_lock_init(&gpuvm->evict.lock);
> > > > > > +
> > > > > >         drm_gpuva_check_overflow(start_offset, range);
> > > > > >         gpuvm->mm_start = start_offset;
> > > > > >         gpuvm->mm_range = range;
> > > > > > @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
> > > > > > *gpuvm)
> > > > > >         WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
> > > > > >              "GPUVA tree is not empty, potentially leaking
> > > > > > memory.\n");
> > > > > > +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
> > > > > > should be empty.\n");
> > > > > > +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
> > > > > > should be empty.\n");
> > > > > > +
> > > > > >         drm_gem_private_object_fini(&gpuvm->d_obj);
> > > > > >    }
> > > > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
> > > > > > +/**
> > > > > > + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
> > > > > > + * @gpuvm: the &drm_gpuvm
> > > > > > + * @exec: the &drm_exec locking context
> > > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > > + *
> > > > > > + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
> > > > > > given
> > > > > > + * &drm_gpuvm contains mappings of.
> > > > > > + *
> > > > > > + * Using this function directly, it is the drivers
> > > > > > responsibility to call
> > > > > > + * drm_exec_init() and drm_exec_fini() accordingly.
> > > > > > + *
> > > > > > + * Note: This function is safe against concurrent insertion
> > > > > > and removal of
> > > > > > + * external objects, however it is not safe against
> > > > > > concurrent usage itself.
> > > > > > + *
> > > > > > + * Drivers need to make sure to protect this case with
> > > > > > either an outer VM lock
> > > > > > + * or by calling drm_gpuvm_prepare_vm() before this function
> > > > > > within the
> > > > > > + * drm_exec_until_all_locked() loop, such that the GPUVM's
> > > > > > dma-resv lock ensures
> > > > > > + * mutual exclusion.
> > > > > > + *
> > > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > > + */
> > > > > > +int
> > > > > > +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > > > > > +                         struct drm_exec *exec,
> > > > > > +                         unsigned int num_fences)
> > > > > > +{
> > > > > > +       struct drm_gpuvm_bo *vm_bo;
> > > > > > +       LIST_HEAD(extobjs);
> > > > > > +       int ret = 0;
> > > > > > +
> > > > > > +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
> > > > > > vm_bo) {
> > > > > > +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
> > > > > > num_fences);
> > > > > > +               if (ret)
> > > > > > +                       break;
> > > > > > +       }
> > > > > > +       /* Drop ref in case we break out of the loop. */
> > > > > > +       drm_gpuvm_bo_put(vm_bo);
> > > > > > +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
> > > > > > +
> > > > > > +       return ret;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
> > > > > > a given range
> > > > > > + * @gpuvm: the &drm_gpuvm
> > > > > > + * @exec: the &drm_exec locking context
> > > > > > + * @addr: the start address within the VA space
> > > > > > + * @range: the range to iterate within the VA space
> > > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > > + *
> > > > > > + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
> > > > > > mapped between @addr
> > > > > > + * and @addr + @range.
> > > > > > + *
> > > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > > + */
> > > > > > +int
> > > > > > +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
> > > > > > drm_exec *exec,
> > > > > > +                       u64 addr, u64 range, unsigned int
> > > > > > num_fences)
> > > > > > +{
> > > > > > +       struct drm_gpuva *va;
> > > > > > +       u64 end = addr + range;
> > > > > > +       int ret;
> > > > > > +
> > > > > > +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
> > > > > > +               struct drm_gem_object *obj = va->gem.obj;
> > > > > > +
> > > > > > +               ret = drm_exec_prepare_obj(exec, obj,
> > > > > > num_fences);
> > > > > > +               if (ret)
> > > > > > +                       return ret;
> > > > > > +       }
> > > > > > +
> > > > > > +       return 0;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_exec_lock() - lock all dma-resv of all
> > > > > > assoiciated BOs
> > > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > > + * @interruptible: sleep interruptible if waiting
> > > > > > + *
> > > > > > + * Acquires all dma-resv locks of all &drm_gem_objects the
> > > > > > given
> > > > > > + * &drm_gpuvm contains mappings of.
> > > > > > + *
> > > > > > + * Addionally, when calling this function with struct
> > > > > > drm_gpuvm_exec::extra
> > > > > > + * being set the driver receives the given @fn callback to
> > > > > > lock additional
> > > > > > + * dma-resv in the context of the &drm_gpuvm_exec instance.
> > > > > > Typically, drivers
> > > > > > + * would call drm_exec_prepare_obj() from within this
> > > > > > callback.
> > > > > > + *
> > > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > > + */
> > > > > > +int
> > > > > > +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > > > > > +                   unsigned int num_fences,
> > > > > > +                   bool interruptible)
> > > > > > +{
> > > > > > +       struct drm_gpuvm *gpuvm = vm_exec->vm;
> > > > > > +       struct drm_exec *exec = &vm_exec->exec;
> > > > > > +       uint32_t flags;
> > > > > > +       int ret;
> > > > > > +
> > > > > > +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
> > > > > > 0 |
> > > > > > +               DRM_EXEC_IGNORE_DUPLICATES;
> > > > > > +
> > > > > > +       drm_exec_init(exec, flags);
> > > > > > +
> > > > > > +       drm_exec_until_all_locked(exec) {
> > > > > > +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
> > > > > > num_fences);
> > > > > > +               drm_exec_retry_on_contention(exec);
> > > > > > +               if (ret)
> > > > > > +                       goto err;
> > > > > > +
> > > > > > +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
> > > > > > num_fences);
> > > > > > +               drm_exec_retry_on_contention(exec);
> > > > > > +               if (ret)
> > > > > > +                       goto err;
> > > > > > +
> > > > > > +               if (vm_exec->extra.fn) {
> > > > > > +                       ret = vm_exec->extra.fn(vm_exec,
> > > > > > num_fences);
> > > > > > +                       drm_exec_retry_on_contention(exec);
> > > > > > +                       if (ret)
> > > > > > +                               goto err;
> > > > > > +               }
> > > > > > +       }
> > > > > > +
> > > > > > +       return 0;
> > > > > > +
> > > > > > +err:
> > > > > > +       drm_exec_fini(exec);
> > > > > > +       return ret;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
> > > > > > +
> > > > > > +static int
> > > > > > +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
> > > > > > num_fences)
> > > > > > +{
> > > > > > +       struct {
> > > > > > +               struct drm_gem_object **objs;
> > > > > > +               unsigned int num_objs;
> > > > > > +       } *args = vm_exec->extra.priv;
> > > > > > +
> > > > > > +       return drm_exec_prepare_array(&vm_exec->exec, args-
> > > > > > >objs,
> > > > > > +                                     args->num_objs,
> > > > > > num_fences);
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
> > > > > > assoiciated BOs
> > > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > > + * @objs: additional &drm_gem_objects to lock
> > > > > > + * @num_objs: the number of additional &drm_gem_objects to
> > > > > > lock
> > > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > > + * @interruptible: sleep interruptible if waiting
> > > > > > + *
> > > > > > + * Acquires all dma-resv locks of all &drm_gem_objects the
> > > > > > given &drm_gpuvm
> > > > > > + * contains mappings of, plus the ones given through @objs.
> > > > > > + *
> > > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > > + */
> > > > > > +int
> > > > > > +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> > > > > > +                         struct drm_gem_object **objs,
> > > > > > +                         unsigned int num_objs,
> > > > > > +                         unsigned int num_fences,
> > > > > > +                         bool interruptible)
> > > > > > +{
> > > > > > +       struct {
> > > > > > +               struct drm_gem_object **objs;
> > > > > > +               unsigned int num_objs;
> > > > > > +       } args;
> > > > > > +
> > > > > > +       args.objs = objs;
> > > > > > +       args.num_objs = num_objs;
> > > > > > +
> > > > > > +       vm_exec->extra.fn = fn_lock_array;
> > > > > > +       vm_exec->extra.priv = &args;
> > > > > > +
> > > > > > +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
> > > > > > interruptible);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
> > > > > > within a given range
> > > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > > + * @addr: the start address within the VA space
> > > > > > + * @range: the range to iterate within the VA space
> > > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > > + * @interruptible: sleep interruptible if waiting
> > > > > > + *
> > > > > > + * Acquires all dma-resv locks of all &drm_gem_objects
> > > > > > mapped between @addr and
> > > > > > + * @addr + @range.
> > > > > > + *
> > > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > > + */
> > > > > > +int
> > > > > > +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> > > > > > +                         u64 addr, u64 range,
> > > > > > +                         unsigned int num_fences,
> > > > > > +                         bool interruptible)
> > > > > > +{
> > > > > > +       struct drm_gpuvm *gpuvm = vm_exec->vm;
> > > > > > +       struct drm_exec *exec = &vm_exec->exec;
> > > > > > +       uint32_t flags;
> > > > > > +       int ret;
> > > > > > +
> > > > > > +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
> > > > > > 0 |
> > > > > > +               DRM_EXEC_IGNORE_DUPLICATES;
> > > > > > +
> > > > > > +       drm_exec_init(exec, flags);
> > > > > > +
> > > > > > +       drm_exec_until_all_locked(exec) {
> > > > > > +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
> > > > > > addr, range,
> > > > > > +                                             num_fences);
> > > > > > +               drm_exec_retry_on_contention(exec);
> > > > > > +               if (ret)
> > > > > > +                       goto err;
> > > > > > +       }
> > > > > > +
> > > > > > +       return ret;
> > > > > > +
> > > > > > +err:
> > > > > > +       drm_exec_fini(exec);
> > > > > > +       return ret;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_validate() - validate all BOs marked as evicted
> > > > > > + * @gpuvm: the &drm_gpuvm to validate evicted BOs
> > > > > > + *
> > > > > > + * Calls the &drm_gpuvm_ops.bo_validate callback for all
> > > > > > evicted buffer
> > > > > > + * objects being mapped in the given &drm_gpuvm.
> > > > > > + *
> > > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > > + */
> > > > > > +int
> > > > > > +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
> > > > > > +{
> > > > > > +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
> > > > > > +       struct drm_gpuvm_bo *vm_bo;
> > > > > > +       LIST_HEAD(evict);
> > > > > > +       int ret = 0;
> > > > > > +
> > > > > > +       if (unlikely(!ops || !ops->bo_validate))
> > > > > > +               return -ENOTSUPP;
> > > > > > +
> > > > > > +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
> > > > > > +               dma_resv_assert_held(vm_bo->obj->resv);
> > > > > > +               ret = ops->bo_validate(vm_bo->obj);
> > > > > > +               if (ret)
> > > > > > +                       break;
> > > > > > +       }
> > > > > > +       /* Drop ref in case we break out of the loop. */
> > > > > > +       drm_gpuvm_bo_put(vm_bo);
> > > > > > +       restore_vm_bo_list(gpuvm, evict, &evict);
> > > > > > +
> > > > > > +       return ret;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_resv_add_fence - add fence to private and all
> > > > > > extobj
> > > > > > + * dma-resv
> > > > > > + * @gpuvm: the &drm_gpuvm to add a fence to
> > > > > > + * @exec: the &drm_exec locking context
> > > > > > + * @fence: fence to add
> > > > > > + * @private_usage: private dma-resv usage
> > > > > > + * @extobj_usage: extobj dma-resv usage
> > > > > > + */
> > > > > > +void
> > > > > > +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > > > > > +                        struct drm_exec *exec,
> > > > > > +                        struct dma_fence *fence,
> > > > > > +                        enum dma_resv_usage private_usage,
> > > > > > +                        enum dma_resv_usage extobj_usage)
> > > > > > +{
> > > > > > +       struct drm_gem_object *obj;
> > > > > > +       unsigned long index;
> > > > > > +
> > > > > > +       drm_exec_for_each_locked_object(exec, index, obj) {
> > > > > > +               dma_resv_assert_held(obj->resv);
> > > > > > +               dma_resv_add_fence(obj->resv, fence,
> > > > > > +                                  drm_gpuvm_is_extobj(gpuvm,
> > > > > > obj) ?
> > > > > > +                                  private_usage :
> > > > > > extobj_usage);
> > > > > > +       }
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
> > > > > > +
> > > > > >    /**
> > > > > >     * drm_gpuvm_bo_create() - create a new instance of struct
> > > > > > drm_gpuvm_bo
> > > > > >     * @gpuvm: The &drm_gpuvm the @obj is mapped in.
> > > > > > @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
> > > > > > *gpuvm,
> > > > > >         INIT_LIST_HEAD(&vm_bo->list.gpuva);
> > > > > >         INIT_LIST_HEAD(&vm_bo->list.entry.gem);
> > > > > > +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
> > > > > > +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
> > > > > > +
> > > > > >         drm_gem_object_get(obj);
> > > > > >         return vm_bo;
> > > > > > @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> > > > > >         drm_gem_gpuva_assert_lock_held(vm_bo->obj);
> > > > > > +       spin_lock(&gpuvm->extobj.lock);
> > > > > > +       list_del(&vm_bo->list.entry.extobj);
> > > > > > +       spin_unlock(&gpuvm->extobj.lock);
> > > > > > +
> > > > > > +       spin_lock(&gpuvm->evict.lock);
> > > > > > +       list_del(&vm_bo->list.entry.evict);
> > > > > > +       spin_unlock(&gpuvm->evict.lock);
> > > > > > +
> > > > > >         list_del(&vm_bo->list.entry.gem);
> > > > > >         drm_gem_object_put(obj);
> > > > > > @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> > > > > >     * @vm_bo: the &drm_gpuvm_bo to release the reference of
> > > > > >     *
> > > > > >     * This releases a reference to @vm_bo.
> > > > > > + *
> > > > > > + * If the reference count drops to zero, the &gpuvm_bo is
> > > > > > destroyed, which
> > > > > > + * includes removing it from the GEMs gpuva list. Hence, if
> > > > > > a call to this
> > > > > > + * function can potentially let the reference count to zero
> > > > > > the caller must
> > > > > > + * hold the dma-resv or driver specific GEM gpuva lock.
> > > > > >     */
> > > > > >    void
> > > > > >    drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
> > > > > > @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
> > > > > > *vm_bo)
> > > > > >    }
> > > > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
> > > > > > +static int __must_check
> > > > > > +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
> > > > > > +{
> > > > > > +       return kref_get_unless_zero(&vm_bo->kref);
> > > > > > +}
> > > > > > +
> > > > > >    static struct drm_gpuvm_bo *
> > > > > >    __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> > > > > >                     struct drm_gem_object *obj)
> > > > > > @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
> > > > > > drm_gpuvm_bo *__vm_bo)
> > > > > >    }
> > > > > >    EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
> > > > > > +/**
> > > > > > + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
> > > > > > &drm_gpuvm's
> > > > > > + * extobj list
> > > > > > + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
> > > > > > extobj list.
> > > > > > + *
> > > > > > + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
> > > > > > not on the list
> > > > > > + * already and if the corresponding &drm_gem_object is an
> > > > > > external object,
> > > > > > + * actually.
> > > > > > + */
> > > > > > +void
> > > > > > +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
> > > > > > +{
> > > > > > +       struct drm_gpuvm *gpuvm = vm_bo->vm;
> > > > > > +
> > > > > > +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
> > > > > > +               drm_gpuvm_bo_list_add(vm_bo, extobj);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
> > > > > > / from a
> > > > > > + * &drm_gpuvms evicted list
> > > > > > + * @obj: the &drm_gem_object to add or remove
> > > > > > + * @evict: indicates whether the object is evicted
> > > > > > + *
> > > > > > + * Adds a &drm_gem_object to or removes it from all
> > > > > > &drm_gpuvms evicted
> > > > > > + * list containing a mapping of this &drm_gem_object.
> > > > > > + */
> > > > > > +void
> > > > > > +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
> > > > > > +{
> > > > > > +       struct drm_gpuvm_bo *vm_bo;
> > > > > > +
> > > > > > +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> > > > > > +               if (evict)
> > > > > > +                       drm_gpuvm_bo_list_add(vm_bo, evict);
> > > > > > +               else
> > > > > > +                       drm_gpuvm_bo_list_del(vm_bo, evict);
> > > > > > +       }
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
> > > > > > +
> > > > > >    static int
> > > > > >    __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
> > > > > >                    struct drm_gpuva *va)
> > > > > > diff --git a/include/drm/drm_gpuvm.h
> > > > > > b/include/drm/drm_gpuvm.h
> > > > > > index afa50b9059a2..834bb6d6617e 100644
> > > > > > --- a/include/drm/drm_gpuvm.h
> > > > > > +++ b/include/drm/drm_gpuvm.h
> > > > > > @@ -26,10 +26,12 @@
> > > > > >     */
> > > > > >    #include <linux/list.h>
> > > > > > +#include <linux/dma-resv.h>
> > > > > >    #include <linux/rbtree.h>
> > > > > >    #include <linux/types.h>
> > > > > >    #include <drm/drm_gem.h>
> > > > > > +#include <drm/drm_exec.h>
> > > > > >    struct drm_gpuvm;
> > > > > >    struct drm_gpuvm_bo;
> > > > > > @@ -259,6 +261,38 @@ struct drm_gpuvm {
> > > > > >          * space
> > > > > >          */
> > > > > >         struct dma_resv *resv;
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @extobj: structure holding the extobj list
> > > > > > +        */
> > > > > > +       struct {
> > > > > > +               /**
> > > > > > +                * @list: &list_head storing &drm_gpuvm_bos
> > > > > > serving as
> > > > > > +                * external object
> > > > > > +                */
> > > > > > +               struct list_head list;
> > > > > > +
> > > > > > +               /**
> > > > > > +                * @lock: spinlock to protect the extobj list
> > > > > > +                */
> > > > > > +               spinlock_t lock;
> > > > > > +       } extobj;
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @evict: structure holding the evict list and evict
> > > > > > list lock
> > > > > > +        */
> > > > > > +       struct {
> > > > > > +               /**
> > > > > > +                * @list: &list_head storing &drm_gpuvm_bos
> > > > > > currently being
> > > > > > +                * evicted
> > > > > > +                */
> > > > > > +               struct list_head list;
> > > > > > +
> > > > > > +               /**
> > > > > > +                * @lock: spinlock to protect the evict list
> > > > > > +                */
> > > > > > +               spinlock_t lock;
> > > > > > +       } evict;
> > > > > >    };
> > > > > >    void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
> > > > > > drm_device *drm,
> > > > > > @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
> > > > > > *gpuvm, struct drm_device *drm,
> > > > > >                     const struct drm_gpuvm_ops *ops);
> > > > > >    void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
> > > > > > +/**
> > > > > > + * drm_gpuvm_is_extobj() - indicates whether the given
> > > > > > &drm_gem_object is an
> > > > > > + * external object
> > > > > > + * @gpuvm: the &drm_gpuvm to check
> > > > > > + * @obj: the &drm_gem_object to check
> > > > > > + *
> > > > > > + * Returns: true if the &drm_gem_object &dma_resv differs
> > > > > > from the
> > > > > > + * &drm_gpuvms &dma_resv, false otherwise
> > > > > > + */
> > > > > > +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
> > > > > > *gpuvm,
> > > > > > +                                      struct drm_gem_object
> > > > > > *obj)
> > > > > > +{
> > > > > > +       return obj && obj->resv != gpuvm->resv;
> > > > > > +}
> > > > > > +
> > > > > >    static inline struct drm_gpuva *
> > > > > >    __drm_gpuva_next(struct drm_gpuva *va)
> > > > > >    {
> > > > > > @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
> > > > > >    #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
> > > > > > \
> > > > > >         list_for_each_entry_safe(va__, next__, &(gpuvm__)-
> > > > > > >rb.list, rb.entry)
> > > > > > +/**
> > > > > > + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
> > > > > > &drm_exec
> > > > > > + *
> > > > > > + * This structure should be created on the stack as
> > > > > > &drm_exec should be.
> > > > > > + *
> > > > > > + * Optionally, @extra can be set in order to lock additional
> > > > > > &drm_gem_objects.
> > > > > > + */
> > > > > > +struct drm_gpuvm_exec {
> > > > > > +       /**
> > > > > > +        * @exec: the &drm_exec structure
> > > > > > +        */
> > > > > > +       struct drm_exec exec;
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @vm: the &drm_gpuvm to lock its DMA reservations
> > > > > > +        */
> > > > > > +       struct drm_gpuvm *vm;
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @extra: Callback and corresponding private data
> > > > > > for the driver to
> > > > > > +        * lock arbitrary additional &drm_gem_objects.
> > > > > > +        */
> > > > > > +       struct {
> > > > > > +               /**
> > > > > > +                * @fn: The driver callback to lock
> > > > > > additional &drm_gem_objects.
> > > > > > +                */
> > > > > > +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
> > > > > > +                         unsigned int num_fences);
> > > > > > +
> > > > > > +               /**
> > > > > > +                * @priv: driver private data for the @fn
> > > > > > callback
> > > > > > +                */
> > > > > > +               void *priv;
> > > > > > +       } extra;
> > > > > > +};
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
> > > > > > resv
> > > > > > + * @gpuvm: the &drm_gpuvm
> > > > > > + * @exec: the &drm_exec context
> > > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > > + *
> > > > > > + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
> > > > > > &drm_gem_object.
> > > > > > + *
> > > > > > + * Using this function directly, it is the drivers
> > > > > > responsibility to call
> > > > > > + * drm_exec_init() and drm_exec_fini() accordingly.
> > > > > > + *
> > > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > > + */
> > > > > > +static inline int
> > > > > > +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
> > > > > > +                    struct drm_exec *exec,
> > > > > > +                    unsigned int num_fences)
> > > > > > +{
> > > > > > +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
> > > > > > num_fences);
> > > > > > +}
> > > > > > +
> > > > > > +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > > > > > +                             struct drm_exec *exec,
> > > > > > +                             unsigned int num_fences);
> > > > > > +
> > > > > > +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
> > > > > > +                           struct drm_exec *exec,
> > > > > > +                           u64 addr, u64 range,
> > > > > > +                           unsigned int num_fences);
> > > > > > +
> > > > > > +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > > > > > +                       unsigned int num_fences,
> > > > > > +                       bool interruptible);
> > > > > > +
> > > > > > +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
> > > > > > *vm_exec,
> > > > > > +                             struct drm_gem_object **objs,
> > > > > > +                             unsigned int num_objs,
> > > > > > +                             unsigned int num_fences,
> > > > > > +                             bool interruptible);
> > > > > > +
> > > > > > +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
> > > > > > *vm_exec,
> > > > > > +                             u64 addr, u64 range,
> > > > > > +                             unsigned int num_fences,
> > > > > > +                             bool interruptible);
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
> > > > > > BOs
> > > > > > + * @gpuvm: the &drm_gpuvm
> > > > > > + *
> > > > > > + * Releases all dma-resv locks of all &drm_gem_objects
> > > > > > previously acquired
> > > > > > + * through drm_gpuvm_lock() or its variants.
> > > > > > + *
> > > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > > + */
> > > > > > +static inline void
> > > > > > +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> > > > > > +{
> > > > > > +       drm_exec_fini(&vm_exec->exec);
> > > > > > +}
> > > > > > +
> > > > > > +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
> > > > > > +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > > > > > +                             struct drm_exec *exec,
> > > > > > +                             struct dma_fence *fence,
> > > > > > +                             enum dma_resv_usage
> > > > > > private_usage,
> > > > > > +                             enum dma_resv_usage
> > > > > > extobj_usage);
> > > > > > +
> > > > > > +/**
> > > > > > + * drm_gpuvm_exec_resv_add_fence()
> > > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > > + * @fence: fence to add
> > > > > > + * @private_usage: private dma-resv usage
> > > > > > + * @extobj_usage: extobj dma-resv usage
> > > > > > + *
> > > > > > + * See drm_gpuvm_resv_add_fence().
> > > > > > + */
> > > > > > +static inline void
> > > > > > +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
> > > > > > *vm_exec,
> > > > > > +                             struct dma_fence *fence,
> > > > > > +                             enum dma_resv_usage
> > > > > > private_usage,
> > > > > > +                             enum dma_resv_usage
> > > > > > extobj_usage)
> > > > > > +{
> > > > > > +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
> > > > > > fence,
> > > > > > +                                private_usage,
> > > > > > extobj_usage);
> > > > > > +}
> > > > > > +
> > > > > >    /**
> > > > > >     * struct drm_gpuvm_bo - structure representing a
> > > > > > &drm_gpuvm and
> > > > > >     * &drm_gem_object combination
> > > > > > @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
> > > > > >                          * gpuva list.
> > > > > >                          */
> > > > > >                         struct list_head gem;
> > > > > > +
> > > > > > +                       /**
> > > > > > +                        * @evict: List entry to attach to
> > > > > > the &drm_gpuvms
> > > > > > +                        * extobj list.
> > > > > > +                        */
> > > > > > +                       struct list_head extobj;
> > > > > > +
> > > > > > +                       /**
> > > > > > +                        * @evict: List entry to attach to
> > > > > > the &drm_gpuvms evict
> > > > > > +                        * list.
> > > > > > +                        */
> > > > > > +                       struct list_head evict;
> > > > > >                 } entry;
> > > > > >         } list;
> > > > > >    };
> > > > > > @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
> > > > > >    drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> > > > > >                   struct drm_gem_object *obj);
> > > > > > +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
> > > > > > evict);
> > > > > > +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
> > > > > > +
> > > > > >    /**
> > > > > >     * drm_gpuvm_bo_for_each_va() - iterator to walk over a
> > > > > > list of &drm_gpuva
> > > > > >     * @va__: &drm_gpuva structure to assign to in each
> > > > > > iteration step
> > > > > > @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
> > > > > >          * used.
> > > > > >          */
> > > > > >         int (*sm_step_unmap)(struct drm_gpuva_op *op, void
> > > > > > *priv);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @bo_validate: called from drm_gpuvm_validate()
> > > > > > +        *
> > > > > > +        * Drivers receive this callback for every evicted
> > > > > > &drm_gem_object being
> > > > > > +        * mapped in the corresponding &drm_gpuvm.
> > > > > > +        *
> > > > > > +        * Typically, drivers would call their driver
> > > > > > specific variant of
> > > > > > +        * ttm_bo_validate() from within this callback.
> > > > > > +        */
> > > > > > +       int (*bo_validate)(struct drm_gem_object *obj);
> > > > > >    };
> > > > > >    int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
> > > 
> > 
>
Thomas Hellstrom Sept. 13, 2023, 1:22 p.m. UTC | #19
On 9/13/23 13:33, Boris Brezillon wrote:
> On Wed, 13 Sep 2023 12:39:01 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> Hi,
>>
>> On 9/13/23 09:19, Boris Brezillon wrote:
>>> On Wed, 13 Sep 2023 17:05:42 +1000
>>> Dave Airlie <airlied@gmail.com> wrote:
>>>   
>>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
>>>> <boris.brezillon@collabora.com> wrote:
>>>>> On Tue, 12 Sep 2023 18:20:32 +0200
>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>      
>>>>>>> +/**
>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
>>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
>>>>>>> + *
>>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
>>>>>>> + * iterator releases the lock immediately after picking the first element from
>>>>>>> + * the list, so list insertion deletion can happen concurrently.
>>>>>> Are the list spinlocks needed for that async state update from within
>>>>>> the dma-fence critical section we've discussed previously?
>>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
>>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
>>>>> get that Xe and Nouveau don't need that because they update the VM
>>>>> state early (in the ioctl path), but I keep thinking this will hurt us
>>>>> if we don't think it through from the beginning, because once you've
>>>>> set this logic to depend only on resv locks, it will be pretty hard to
>>>>> get back to a solution which lets synchronous VM_BINDs take precedence
>>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
>>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
>>>>> take a long time to get your synchronous VM_BIND executed...
>> So this would boil down to either (possibly opt-in) keeping the spinlock
>> approach or pushing the unlink out to a wq then?
> Deferred _unlink() would not be an issue, since I already defer the
> drm_gpuva destruction to a wq, it would just a be a matter of moving the
> _unlink() call there as well. But _link() also takes the GEM gpuva list
> lock, and that one is bit tricky, in that sm_map() can trigger 2 more
> _link() calls for the prev/next mappings, which we can't guess until we
> get to execute the VM update. If we mandate the use of the GEM resv
> lock, that simply means async VM updates (AKA calling
> drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
> agrees on, then I'd like the APIs that make this sort of async VM
> update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
> methods, and probably other things) to be dropped, so we don't make it
> look like it's something we support.
>
>> BTW, as also asked in a reply to Danilo, how do you call unlink from
>> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?
> _unlink() makes sure the GEM gpuva list lock is taken, but this can be
> a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
> panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
> protection. We make sure we never take this lock while allocating
> memory to guarantee the dma-signalling path can't deadlock.
>
>>>>>      
>>>> btw what is the use case for this? do we have actual vulkan
>>>> applications we know will have problems here?
>>> I don't, but I think that's a concern Faith raised at some point (dates
>>> back from when I was reading threads describing how VM_BIND on i915
>>> should work, and I was clearly discovering this whole VM_BIND thing at
>>> that time, so maybe I misunderstood).
>>>   
>>>> it feels like a bit of premature optimisation, but maybe we have use cases.
>>> Might be, but that's the sort of thing that would put us in a corner if
>>> we don't have a plan for when the needs arise. Besides, if we don't
>>> want to support that case because it's too complicated, I'd recommend
>>> dropping all the drm_gpuvm APIs that let people think this mode is
>>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
>>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
>>> confusion.
>> Xe allows bypassing the bind-queue with another bind-queue, but to
>> completely avoid dependencies between queues the Operations may not
>> overlap.
> So, you check the VM state with some VM lock held (would be the VM resv
> in my case), and if the mapping is new (no overlaps with pre-existing
> mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
> be missing I guess is a way to know if the mapping is active (MMU has
> been updated) or pending (MMU update queued to the bind-queue), so I can
> fast-track mapping/unmapping of active mappings. This would leave
> overlapping sync/async VM updates, which can't happen in practice
> unless userspace is doing something wrong (sparse bindings always go
> through vkQueueBindSparse).

User-space is allowed to create new bind queues at will, and they 
execute independently save for range overlaps.

And the overlapping granularity depends very much on the detail of the 
range tracking.
We drafted this fenced range utility

https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/353

That tracks active ranges that remove themselves when the attached fence 
signals. Not sure if we ended up using it, though. A new binding would 
scan this utility for dma-fences it needs to depend upon. Ranges in Xe 
are actually page-table modification ranges, so can exceed the actual VA 
range in some situations, but if you can build page-table structures 
async the granularity indeed becomes better.

/Thomas



>
> I'll give it a try.
>
>> (And the definition of overlap is currently page-table
>> structure updates may not overlap) but no guarantees are made about
>> priority.
>>
>> /Thomas
>>
>>
>>
Boris Brezillon Sept. 13, 2023, 2:01 p.m. UTC | #20
On Wed, 13 Sep 2023 15:22:56 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> On 9/13/23 13:33, Boris Brezillon wrote:
> > On Wed, 13 Sep 2023 12:39:01 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> >> Hi,
> >>
> >> On 9/13/23 09:19, Boris Brezillon wrote:  
> >>> On Wed, 13 Sep 2023 17:05:42 +1000
> >>> Dave Airlie <airlied@gmail.com> wrote:
> >>>     
> >>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
> >>>> <boris.brezillon@collabora.com> wrote:  
> >>>>> On Tue, 12 Sep 2023 18:20:32 +0200
> >>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>>>        
> >>>>>>> +/**
> >>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
> >>>>>>> + * @__gpuvm: The GPU VM
> >>>>>>> + * @__list_name: The name of the list we're iterating on
> >>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
> >>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> >>>>>>> + *
> >>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
> >>>>>>> + * iterator releases the lock immediately after picking the first element from
> >>>>>>> + * the list, so list insertion deletion can happen concurrently.  
> >>>>>> Are the list spinlocks needed for that async state update from within
> >>>>>> the dma-fence critical section we've discussed previously?  
> >>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> >>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
> >>>>> get that Xe and Nouveau don't need that because they update the VM
> >>>>> state early (in the ioctl path), but I keep thinking this will hurt us
> >>>>> if we don't think it through from the beginning, because once you've
> >>>>> set this logic to depend only on resv locks, it will be pretty hard to
> >>>>> get back to a solution which lets synchronous VM_BINDs take precedence
> >>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
> >>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
> >>>>> take a long time to get your synchronous VM_BIND executed...  
> >> So this would boil down to either (possibly opt-in) keeping the spinlock
> >> approach or pushing the unlink out to a wq then?  
> > Deferred _unlink() would not be an issue, since I already defer the
> > drm_gpuva destruction to a wq, it would just a be a matter of moving the
> > _unlink() call there as well. But _link() also takes the GEM gpuva list
> > lock, and that one is bit tricky, in that sm_map() can trigger 2 more
> > _link() calls for the prev/next mappings, which we can't guess until we
> > get to execute the VM update. If we mandate the use of the GEM resv
> > lock, that simply means async VM updates (AKA calling
> > drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
> > agrees on, then I'd like the APIs that make this sort of async VM
> > update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
> > methods, and probably other things) to be dropped, so we don't make it
> > look like it's something we support.
> >  
> >> BTW, as also asked in a reply to Danilo, how do you call unlink from
> >> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?  
> > _unlink() makes sure the GEM gpuva list lock is taken, but this can be
> > a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
> > panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
> > protection. We make sure we never take this lock while allocating
> > memory to guarantee the dma-signalling path can't deadlock.
> >  
> >>>>>        
> >>>> btw what is the use case for this? do we have actual vulkan
> >>>> applications we know will have problems here?  
> >>> I don't, but I think that's a concern Faith raised at some point (dates
> >>> back from when I was reading threads describing how VM_BIND on i915
> >>> should work, and I was clearly discovering this whole VM_BIND thing at
> >>> that time, so maybe I misunderstood).
> >>>     
> >>>> it feels like a bit of premature optimisation, but maybe we have use cases.  
> >>> Might be, but that's the sort of thing that would put us in a corner if
> >>> we don't have a plan for when the needs arise. Besides, if we don't
> >>> want to support that case because it's too complicated, I'd recommend
> >>> dropping all the drm_gpuvm APIs that let people think this mode is
> >>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
> >>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
> >>> confusion.  
> >> Xe allows bypassing the bind-queue with another bind-queue, but to
> >> completely avoid dependencies between queues the Operations may not
> >> overlap.  
> > So, you check the VM state with some VM lock held (would be the VM resv
> > in my case), and if the mapping is new (no overlaps with pre-existing
> > mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
> > be missing I guess is a way to know if the mapping is active (MMU has
> > been updated) or pending (MMU update queued to the bind-queue), so I can
> > fast-track mapping/unmapping of active mappings. This would leave
> > overlapping sync/async VM updates, which can't happen in practice
> > unless userspace is doing something wrong (sparse bindings always go
> > through vkQueueBindSparse).  
> 
> User-space is allowed to create new bind queues at will, and they 
> execute independently save for range overlaps.

I've limited panthor to just one bind-queue that's automatically
created when the VM is created. I guess letting userspace create more
than one queue is doable, but we'd still be serializing VM
operations anyway and that complicates the whole thing when concurrent
operations to the same VM region happen from different bind queues, so I
figured it'd be simpler to expose just one queue.

> 
> And the overlapping granularity depends very much on the detail of the 
> range tracking.
> We drafted this fenced range utility
> 
> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/353
> 
> That tracks active ranges that remove themselves when the attached fence 
> signals. Not sure if we ended up using it, though. A new binding would 
> scan this utility for dma-fences it needs to depend upon.

Sounds like implicit deps on VM ranges :D. I'll have a look, thanks
for the pointer! 

> Ranges in Xe 
> are actually page-table modification ranges, so can exceed the actual VA 
> range in some situations, but if you can build page-table structures 
> async the granularity indeed becomes better.

The granularity in Mali is 4k, and we don't build the page table struct
asynchronously, we just update the page table tree from the CPU,
holding a VM lock to serialize such operations (that's done
synchronously in the ::run_job() path, or from the ioctl in case of a
sync-VM_BIND).

> 
> /Thomas
> 
> 
> 
> >
> > I'll give it a try.
> >  
> >> (And the definition of overlap is currently page-table
> >> structure updates may not overlap) but no guarantees are made about
> >> priority.
> >>
> >> /Thomas
> >>
> >>
> >>
Christian König Sept. 13, 2023, 2:26 p.m. UTC | #21
Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
> As mentioned in a different mail thread, the reply is based on the assumption
> that we don't support anything else than GPUVM updates from the IOCTL.

I think that this assumption is incorrect.

Vulkan is just once specific use case, but this here should probably be 
able to handle other use cases as well.

Especially with HMM you get the requirement that you need to be able to 
invalidate GPUVM mappings without grabbing a reservation lock.

See what the eviction lock in amdgpu is doing for example.

Regards,
Christian.

>
> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>> Hi!
>>
>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>>>>>> Hi, Danilo,
>>>>>>
>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>> track GPU VA
>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>> to their
>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>> on the GPU VA
>>>>>>> space.
>>>>>>>
>>>>>>> However, there are more design patterns commonly used by
>>>>>>> drivers, which
>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>> manager
>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>> this patch aims
>>>>>>> at generalizing the following elements.
>>>>>>>
>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>> outside of
>>>>>>>       this GPU-VM.
>>>>>>>
>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>> which are
>>>>>>>       shared with other GPU-VMs).
>>>>>>>
>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>> resv the
>>>>>>>       GPU-VM contains mappings of.
>>>>>>>
>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>> contains mappings
>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>> accelerated.
>>>>>>>
>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>
>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>> make all
>>>>>>> features appear as a collection of optional helper functions,
>>>>>>> such that
>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>> functionality and opt-in for other features without setting
>>>>>>> any feature
>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>
>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>> locking for drivers
>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>
>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>> instance of this
>>>>>>>      * particular combination. If not existent a new instance
>>>>>>> is created and linked
>>>>>>>      * to the &drm_gem_object.
>>>>>>> + *
>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>> &drm_gpuvm, are also used
>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>> evicted objects. Those
>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>> dma-resv locks and
>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>> instance the all
>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>> locked by calling
>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>> drm_gpuvm_validate() in
>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>> also possible to lock
>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>> corresponding parameters to
>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>> loop while making
>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>> or
>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>> + *
>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>> when its &dma_resv
>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>> &dma_resv structure.
>>>>>>>      */
>>>>>>>     /**
>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>> &drm_gpuvm and
>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>> creations and destructions
>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>> + *
>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>> evicted objects are
>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>> iteration internally.
>>>>>>> + *
>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>> calls to functions
>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>> a particular
>>>>>>> + * comment and lockdep checks if possible.
>>>>>>> + *
>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>> such as
>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>> called with external
>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>> corresponding list to be
>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>> other API functions.
>>>>>>> + * However, this is entirely optional.
>>>>>>>      */
>>>>>>>     /**
>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>      *   }
>>>>>>>      */
>>>>>>> +/**
>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>> already iterated items
>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>> + *
>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>> Lockless as in, the
>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>> first element from
>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>> concurrently.
>>>>>> Are the list spinlocks needed for that async state update from
>>>>>> within the
>>>>>> dma-fence critical section we've discussed previously?
>>>>> Yes, but also for other reasons, see below.
>>>>>
>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>> gpuvm's resv
>>>>>> (or for the extobj list with an outer lock).
>>>>>>
>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>> could we
>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>> allows for)?
>>>>> The evict spinlock is needed in any case, since in
>>>>> drm_gpuvm_bo_evict() we're
>>>>> holding only the dma-resv lock from the BO this function gets
>>>>> called for. Hence,
>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>> different BOs.
>>>> No. Only if you try to add external objects to the vm's evict list
>>>> from
>>>> within the evict code. That's not necessary since you loop through
>>>> all
>>>> external objects anyway when locking them so an "evicted" bool in
>>>> the vm_bo,
>>>> protected by the bo resv would be sufficient. The extobj locking
>>>> loop can
>>>> then add the bo to the evicted list.
>>> And validate() can remove it while still holding all dma-resv locks,
>>> neat!
>>> However, what if two tasks are trying to lock the VA space
>>> concurrently? What
>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>> drm_gpuva_unlink()?
>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>> on the
>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>> with the
>>> dma-resv lock held, which wouldn't be allowed, since
>>> drm_gpuvm_bo_destroy()
>>> might drop the last reference to the drm_gem_object and hence we'd
>>> potentially
>>> free the dma-resv lock while holding it, at least if it's an external
>>> object.
>> Easiest way in this scheme is to think of the lists as being protected
>> by the vm's resv lock. That means anybody calling unlink() must also
>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>> perhaps not from a locking inversion POW from an async list update).
> This would mean that on unlink() we'd need to hold the VM's resv lock and the
> corresponding GEM's resv lock (in case they're not the same anyways) because the
> VM's resv lock would protect the external / evicted object lists and the GEM
> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
> drm_gpuvm_bo's list of drm_gpuvas.
>
>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>> really would not
>>>>> like to add even more complexity just to get the spinlock out of
>>>>> the way in case
>>>>> the driver already has an outer lock protecting this path.
>>>> I must disagree here. These spinlocks and atomic operations are
>>>> pretty
>>>> costly and as discussed earlier this type of locking was the reason
>>>> (at
>>>> least according to the commit message) that made Christian drop the
>>>> XArray
>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>> is
>>>> unecessary and measurable". IMHO the spinlock is the added
>>>> complexity and a
>>>> single wide lock following the drm locking guidelines set out by
>>>> Daniel and
>>>> David should really be the default choice with an opt-in for a
>>>> spinlock if
>>>> needed for async and pushing out to a wq is not an option.
>>> For the external object list an outer lock would work as long as it's
>>> not the
>>> dma-resv lock of the corresponding GEM object, since here we actually
>>> need to
>>> remove the list entry from the external object list on
>>> drm_gpuvm_bo_destroy().
>>> It's just a bit weird design wise that drivers would need to take
>>> this outer
>>> lock on:
>>>
>>> - drm_gpuvm_bo_extobj_add()
>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>> - drm_gpuva_unlink()            (because it needs to call
>>> drm_gpuvm_bo_put())
>>> - drm_gpuvm_exec_lock()
>>> - drm_gpuvm_exec_lock_array()
>>> - drm_gpuvm_prepare_range()
>>>
>>> Given that it seems reasonable to do all the required locking
>>> internally.
>>  From a design POW, there has been a clear direction in XE to make
>> things similar to mmap() / munmap(), so this outer lock, which in Xe is
>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>> the page-table structures and vma rb tree, the userptr structures and
>> the extobj list. Basically it's taken early in the exec IOCTL, the
>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>> all of the above are just asserting that it is taken in the correct
>> mode.
>>
>> But strictly with this scheme one could also use the vm's dma_resv for
>> the extobj list since with drm_exec, it's locked before traversing the
>> list.
>>
>> The whole point of this scheme is to rely on locks that you already are
>> supposed to be holding for various reasons and is simple to comprehend.
> I don't agree that we're supposed to hold the VM's resv lock anyways for
> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine using it
> for that purpose nevertheless.
>
>>> In order to at least place lockdep checks, the driver would need to
>>> supply the
>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>> know about
>>> the lock.
>> Yes, that sounds reasonable. One lockdep map per list.
> I'd really like to avoid that, especially now that everything got simpler. We
> should define the actual locks to take instead.
>
>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>> need to
>>> spin?
>> I guess it's hard to tell exactly, but it is much lower on modern x86
>> than what it used to be. Not sure about ARM, which is the other
>> architecture important to us. I figure if there is little cache-line
>> bouncing the main overhead comes from the implied barriers.
>>
>>>> A pretty simple way that would not add much code would be
>>>>
>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>> spinlock_t
>>>> *lock)
>>>>
>>>> {
>>>>
>>>>      if (!gpuvm->resv_protected_lists)
>>>>          spin_lock(lock);
>>>>
>>>> }
>>>>
>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>> hold the vm's
>>>>>> resv, though.
>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>> gpuva list (or
>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>> lock for that
>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>> otherwise wouldn't
>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>> was referring to
>>>>> earlier.
>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>> list, but
>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>> problem. We
>>>> may free the object and a pointer to the vm's resv during unlink
>>>> but we
>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>> calls to
>>>> unlink from *within* drm_gpuvm allows it to be held.
>>> Drivers calling unlink() from the fence signaling path can't use the
>>> VM's
>>> dma-resv lock.
>> Yes, that made me a bit curious because in the current version the code
>> required the object's dma_resv for unlink() which can't be grabbed
>> either from the fence signaling path. So are there any drivers actually
>> wanting to do that? If so, they will either need to resort to the
>> current spinlock solution or they will need to call unlink from a
>> workqueue item.
> As Boris already mentioned we have the dma-resv lock by default or a driver
> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>
>>> Also, what if the object is an external object? We can't use the VM's
>>> dma-resv
>>> lock here.
>> Why? Typically (sync) unlink is only ever called from an unbind-like
>> operation where it should be trivial to grab the vm's resv. Or, for
>> that matter any outer lock protecting the extobj list. Rule would be
>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>> be protected by either the vm's dma_resv (or possibly an outer lock in
>> the case of the extobj list).
> Outer lock wouldn't have been working for updates in the async path, but
> shouldn't be relevant anymore. We could use the VM's resv for that.
>
>>>   And we can't have the GEM objs dma-resv lock held when calling
>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>> refcount drops
>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>> drop the
>>> last reference of the GEM object.
>> Yes, but this is a different problem as to what exactly protects
>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
>> Boris didn't like that, but requiring an explicit refcount for a
>> pointer you dereference unless you're under a lock that ensures keeping
>> the object alive is pretty much required?) But anyway for the
>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
>> I don't have a strong preference.
> We can keep the GEM objects dma-resv lock, however as mentioned above
> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's resv lock
> and the GEM's resv lock in case they differ.
>
>>>   All those problems go away with a dedicated
>>> GEM gpuva list lock.
>> I don't think these are real problems.
>> With the excepton of the eviction list "trick" where we currently have
>> slightly different approach to collect external bos needing rebinding,
>> we have this working fine.
>>
>> TBH I think pretty much the only situation where the spinlock is needed
>> is for async updates of these lists, unless a wq item can be used for
>> that, but it doesn't really seem like the current code allows for such
>> updates anyway? It complicates the code a lot, adds overhead and also
>> adds the requirement for refcounting during list traversal.
>>
>> /Thomas
>>
>>>> /Thomas
>>>>
>>>>
>>>>>> It seems that with that also the refcount could be make non-
>>>>>> atomic.
>>>>>>
>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>> when
>>>>>> possible".
>>>>>> Lower level locks only when necessary for performance or
>>>>>> locking inversion?
>>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
>>>>>>> + *
>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>> local list, so removal
>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>> iterating the list.
>>>>>>> + */
>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>> +       ({
>>>>>>>                             \
>>>>>>> +               struct drm_gpuvm_bo
>>>>>>> *__vm_bo;                                           \
>>>>>>> +
>>>>>>>                             \
>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>                             \
>>>>>>> +
>>>>>>>                             \
>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>> __list_name.lock);                                \
>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>> __list_name.list)) {                     \
>>>>>>> +                       __vm_bo =
>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>> +                                                  struct
>>>>>>> drm_gpuvm_bo,                 \
>>>>>>> +
>>>>>>> list.entry.__list_name);             \
>>>>>>> +                       if
>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>> {                    \
>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>> list.entry.__list_name,      \
>>>>>>> +
>>>>>>> __local_list);                           \
>>>>>>> +                               break;
>>>>>>>                             \
>>>>>>> +                       } else
>>>>>>> {                                                        \
>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>> list.entry.__list_name);      \
>>>>>>> +                               __vm_bo =
>>>>>>> NULL;                                         \
>>>>>>> +                       }
>>>>>>>                             \
>>>>>>> +               }
>>>>>>>                             \
>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>> __list_name.lock);                              \
>>>>>>> +
>>>>>>>                             \
>>>>>>> +               __vm_bo;
>>>>>>>                             \
>>>>>>> +       })
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>> + *
>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>> Lockless as in, the
>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>> first element from the
>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>> concurrently.
>>>>>>> + *
>>>>>>> + * Typical use:
>>>>>>> + *
>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>> + *
>>>>>>> + *     ret = 0;
>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>> &my_local_list, vm_bo) {
>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>> + *             if (ret)
>>>>>>> + *                     break;
>>>>>>> + *     }
>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>> &my_local_list);
>>>>>>> + *
>>>>>>> + *
>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>> exposed to the outside
>>>>>>> + * world.
>>>>>>> + */
>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>> __local_list, __vm_bo)    \
>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>> __list_name,           \
>>>>>>> +                                               __local_list,
>>>>>>> NULL);            \
>>>>>>> +
>>>>>>> __vm_bo;
>>>>>>>        \
>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>> __list_name,           \
>>>>>>> +                                               __local_list,
>>>>>>> __vm_bo))         \
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>> original list
>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>> already iterated items
>>>>>>> + *
>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>> restore_vm_bo_list()
>>>>>>> + * to restore the original state and let new iterations take
>>>>>>> place.
>>>>>>> + */
>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>> __local_list)                         \
>>>>>>> +       do
>>>>>>> {
>>>>>>>                  \
>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>> list elements to the          \
>>>>>>> +                * head to preserve previous ordering, in
>>>>>>> case it matters.              \
>>>>>>> +
>>>>>>> */
>>>>>>>            \
>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>> __list_name.lock);                                \
>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>> __list_name.list);                \
>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>> __list_name.lock);                              \
>>>>>>> +       } while (0)
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>> list
>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>> + *
>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>> @__list_name and
>>>>>>> + * increases the vm_bo's reference count.
>>>>>>> + */
>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>> __list_name)                            \
>>>>>>> +       do
>>>>>>> {
>>>>>>>          \
>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>> __list_name.lock);                    \
>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>> list.entry.__list_name))             \
>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>> list.entry.__list_name,       \
>>>>>>> +                                     &(__vm_bo)->vm-
>>>>>>>> __list_name.list);        \
>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>> __list_name.lock);                  \
>>>>>>> +       } while (0)
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>> list
>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>> + *
>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>> @__list_name and
>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>> + */
>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>> __list_name)                            \
>>>>>>> +       do
>>>>>>> {
>>>>>>>          \
>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>> __list_name.lock);                    \
>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>> list.entry.__list_name))            \
>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>> list.entry.__list_name);      \
>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>> __list_name.lock);                  \
>>>>>>> +       } while (0)
>>>>>>> +
>>>>>>> +static int __must_check
>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>> +
>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>> drm_gpuva, rb.node)
>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>> struct drm_device *drm,
>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>> +
>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>> +
>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>          gpuvm->mm_range = range;
>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>> *gpuvm)
>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>> memory.\n");
>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>> should be empty.\n");
>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>> should be empty.\n");
>>>>>>> +
>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>     }
>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>> + *
>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>> given
>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>> + *
>>>>>>> + * Using this function directly, it is the drivers
>>>>>>> responsibility to call
>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>> + *
>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>> and removal of
>>>>>>> + * external objects, however it is not safe against
>>>>>>> concurrent usage itself.
>>>>>>> + *
>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>> either an outer VM lock
>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>> within the
>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>> dma-resv lock ensures
>>>>>>> + * mutual exclusion.
>>>>>>> + *
>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +int
>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>> +                         struct drm_exec *exec,
>>>>>>> +                         unsigned int num_fences)
>>>>>>> +{
>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>> +       int ret = 0;
>>>>>>> +
>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>> vm_bo) {
>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>> num_fences);
>>>>>>> +               if (ret)
>>>>>>> +                       break;
>>>>>>> +       }
>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>> +
>>>>>>> +       return ret;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>> a given range
>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>> + * @addr: the start address within the VA space
>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>> + *
>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>> mapped between @addr
>>>>>>> + * and @addr + @range.
>>>>>>> + *
>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +int
>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>> drm_exec *exec,
>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>> num_fences)
>>>>>>> +{
>>>>>>> +       struct drm_gpuva *va;
>>>>>>> +       u64 end = addr + range;
>>>>>>> +       int ret;
>>>>>>> +
>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>> +
>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>> num_fences);
>>>>>>> +               if (ret)
>>>>>>> +                       return ret;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       return 0;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>> assoiciated BOs
>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>> + *
>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>> given
>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>> + *
>>>>>>> + * Addionally, when calling this function with struct
>>>>>>> drm_gpuvm_exec::extra
>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>> lock additional
>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>> Typically, drivers
>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>> callback.
>>>>>>> + *
>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +int
>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>> +                   unsigned int num_fences,
>>>>>>> +                   bool interruptible)
>>>>>>> +{
>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>> +       uint32_t flags;
>>>>>>> +       int ret;
>>>>>>> +
>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>> 0 |
>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>> +
>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>> +
>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>> num_fences);
>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>> +               if (ret)
>>>>>>> +                       goto err;
>>>>>>> +
>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>> num_fences);
>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>> +               if (ret)
>>>>>>> +                       goto err;
>>>>>>> +
>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>> num_fences);
>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>> +                       if (ret)
>>>>>>> +                               goto err;
>>>>>>> +               }
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       return 0;
>>>>>>> +
>>>>>>> +err:
>>>>>>> +       drm_exec_fini(exec);
>>>>>>> +       return ret;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>> +
>>>>>>> +static int
>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>> num_fences)
>>>>>>> +{
>>>>>>> +       struct {
>>>>>>> +               struct drm_gem_object **objs;
>>>>>>> +               unsigned int num_objs;
>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>> +
>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>> objs,
>>>>>>> +                                     args->num_objs,
>>>>>>> num_fences);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>> assoiciated BOs
>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>> lock
>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>> + *
>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>> given &drm_gpuvm
>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>> + *
>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +int
>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>> +                         unsigned int num_objs,
>>>>>>> +                         unsigned int num_fences,
>>>>>>> +                         bool interruptible)
>>>>>>> +{
>>>>>>> +       struct {
>>>>>>> +               struct drm_gem_object **objs;
>>>>>>> +               unsigned int num_objs;
>>>>>>> +       } args;
>>>>>>> +
>>>>>>> +       args.objs = objs;
>>>>>>> +       args.num_objs = num_objs;
>>>>>>> +
>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>> +
>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>> interruptible);
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>> within a given range
>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>> + * @addr: the start address within the VA space
>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>> + *
>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>> mapped between @addr and
>>>>>>> + * @addr + @range.
>>>>>>> + *
>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +int
>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>> +                         u64 addr, u64 range,
>>>>>>> +                         unsigned int num_fences,
>>>>>>> +                         bool interruptible)
>>>>>>> +{
>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>> +       uint32_t flags;
>>>>>>> +       int ret;
>>>>>>> +
>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>> 0 |
>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>> +
>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>> +
>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>> addr, range,
>>>>>>> +                                             num_fences);
>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>> +               if (ret)
>>>>>>> +                       goto err;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       return ret;
>>>>>>> +
>>>>>>> +err:
>>>>>>> +       drm_exec_fini(exec);
>>>>>>> +       return ret;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>> + *
>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>> evicted buffer
>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>> + *
>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +int
>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>> +{
>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>> +       LIST_HEAD(evict);
>>>>>>> +       int ret = 0;
>>>>>>> +
>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>> +               return -ENOTSUPP;
>>>>>>> +
>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>> +               if (ret)
>>>>>>> +                       break;
>>>>>>> +       }
>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>> +
>>>>>>> +       return ret;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>> extobj
>>>>>>> + * dma-resv
>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>> + * @fence: fence to add
>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>> + */
>>>>>>> +void
>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>> +                        struct drm_exec *exec,
>>>>>>> +                        struct dma_fence *fence,
>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>> +{
>>>>>>> +       struct drm_gem_object *obj;
>>>>>>> +       unsigned long index;
>>>>>>> +
>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>> +                                  drm_gpuvm_is_extobj(gpuvm,
>>>>>>> obj) ?
>>>>>>> +                                  private_usage :
>>>>>>> extobj_usage);
>>>>>>> +       }
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>> +
>>>>>>>     /**
>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>> drm_gpuvm_bo
>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>> *gpuvm,
>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>> +
>>>>>>>          drm_gem_object_get(obj);
>>>>>>>          return vm_bo;
>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>> +
>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>> +
>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>          drm_gem_object_put(obj);
>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>      *
>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>> + *
>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>> destroyed, which
>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>> a call to this
>>>>>>> + * function can potentially let the reference count to zero
>>>>>>> the caller must
>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>      */
>>>>>>>     void
>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>> *vm_bo)
>>>>>>>     }
>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>> +static int __must_check
>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>> +{
>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>> +}
>>>>>>> +
>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>                      struct drm_gem_object *obj)
>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>     }
>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>> &drm_gpuvm's
>>>>>>> + * extobj list
>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>> extobj list.
>>>>>>> + *
>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>> not on the list
>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>> external object,
>>>>>>> + * actually.
>>>>>>> + */
>>>>>>> +void
>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>> +{
>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>> +
>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>> / from a
>>>>>>> + * &drm_gpuvms evicted list
>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>> + *
>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>> &drm_gpuvms evicted
>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>> + */
>>>>>>> +void
>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>> +{
>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>> +
>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>> +               if (evict)
>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>> +               else
>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>> +       }
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>> +
>>>>>>>     static int
>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>                     struct drm_gpuva *va)
>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>      */
>>>>>>>     #include <linux/list.h>
>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>     #include <linux/rbtree.h>
>>>>>>>     #include <linux/types.h>
>>>>>>>     #include <drm/drm_gem.h>
>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>     struct drm_gpuvm;
>>>>>>>     struct drm_gpuvm_bo;
>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>           * space
>>>>>>>           */
>>>>>>>          struct dma_resv *resv;
>>>>>>> +
>>>>>>> +       /**
>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>> +        */
>>>>>>> +       struct {
>>>>>>> +               /**
>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>> serving as
>>>>>>> +                * external object
>>>>>>> +                */
>>>>>>> +               struct list_head list;
>>>>>>> +
>>>>>>> +               /**
>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>> +                */
>>>>>>> +               spinlock_t lock;
>>>>>>> +       } extobj;
>>>>>>> +
>>>>>>> +       /**
>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>> list lock
>>>>>>> +        */
>>>>>>> +       struct {
>>>>>>> +               /**
>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>> currently being
>>>>>>> +                * evicted
>>>>>>> +                */
>>>>>>> +               struct list_head list;
>>>>>>> +
>>>>>>> +               /**
>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>> +                */
>>>>>>> +               spinlock_t lock;
>>>>>>> +       } evict;
>>>>>>>     };
>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>> drm_device *drm,
>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>> &drm_gem_object is an
>>>>>>> + * external object
>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>> + *
>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>> from the
>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>> + */
>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>> *gpuvm,
>>>>>>> +                                      struct drm_gem_object
>>>>>>> *obj)
>>>>>>> +{
>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>> +}
>>>>>>> +
>>>>>>>     static inline struct drm_gpuva *
>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>     {
>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>> \
>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>> rb.list, rb.entry)
>>>>>>> +/**
>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>> &drm_exec
>>>>>>> + *
>>>>>>> + * This structure should be created on the stack as
>>>>>>> &drm_exec should be.
>>>>>>> + *
>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>> &drm_gem_objects.
>>>>>>> + */
>>>>>>> +struct drm_gpuvm_exec {
>>>>>>> +       /**
>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>> +        */
>>>>>>> +       struct drm_exec exec;
>>>>>>> +
>>>>>>> +       /**
>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>> +        */
>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>> +
>>>>>>> +       /**
>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>> for the driver to
>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>> +        */
>>>>>>> +       struct {
>>>>>>> +               /**
>>>>>>> +                * @fn: The driver callback to lock
>>>>>>> additional &drm_gem_objects.
>>>>>>> +                */
>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>> +                         unsigned int num_fences);
>>>>>>> +
>>>>>>> +               /**
>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>> callback
>>>>>>> +                */
>>>>>>> +               void *priv;
>>>>>>> +       } extra;
>>>>>>> +};
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>> resv
>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>> + * @exec: the &drm_exec context
>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>> + *
>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>> &drm_gem_object.
>>>>>>> + *
>>>>>>> + * Using this function directly, it is the drivers
>>>>>>> responsibility to call
>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>> + *
>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +static inline int
>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>> +                    struct drm_exec *exec,
>>>>>>> +                    unsigned int num_fences)
>>>>>>> +{
>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>> num_fences);
>>>>>>> +}
>>>>>>> +
>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>> +                             struct drm_exec *exec,
>>>>>>> +                             unsigned int num_fences);
>>>>>>> +
>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>> +                           struct drm_exec *exec,
>>>>>>> +                           u64 addr, u64 range,
>>>>>>> +                           unsigned int num_fences);
>>>>>>> +
>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>> +                       unsigned int num_fences,
>>>>>>> +                       bool interruptible);
>>>>>>> +
>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>> *vm_exec,
>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>> +                             unsigned int num_objs,
>>>>>>> +                             unsigned int num_fences,
>>>>>>> +                             bool interruptible);
>>>>>>> +
>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>> *vm_exec,
>>>>>>> +                             u64 addr, u64 range,
>>>>>>> +                             unsigned int num_fences,
>>>>>>> +                             bool interruptible);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>> BOs
>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>> + *
>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>> previously acquired
>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>> + *
>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +static inline void
>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>> +{
>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>> +}
>>>>>>> +
>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>> +                             struct drm_exec *exec,
>>>>>>> +                             struct dma_fence *fence,
>>>>>>> +                             enum dma_resv_usage
>>>>>>> private_usage,
>>>>>>> +                             enum dma_resv_usage
>>>>>>> extobj_usage);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>> + * @fence: fence to add
>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>> + *
>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>> + */
>>>>>>> +static inline void
>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>> *vm_exec,
>>>>>>> +                             struct dma_fence *fence,
>>>>>>> +                             enum dma_resv_usage
>>>>>>> private_usage,
>>>>>>> +                             enum dma_resv_usage
>>>>>>> extobj_usage)
>>>>>>> +{
>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>> fence,
>>>>>>> +                                private_usage,
>>>>>>> extobj_usage);
>>>>>>> +}
>>>>>>> +
>>>>>>>     /**
>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>> &drm_gpuvm and
>>>>>>>      * &drm_gem_object combination
>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>                           * gpuva list.
>>>>>>>                           */
>>>>>>>                          struct list_head gem;
>>>>>>> +
>>>>>>> +                       /**
>>>>>>> +                        * @evict: List entry to attach to
>>>>>>> the &drm_gpuvms
>>>>>>> +                        * extobj list.
>>>>>>> +                        */
>>>>>>> +                       struct list_head extobj;
>>>>>>> +
>>>>>>> +                       /**
>>>>>>> +                        * @evict: List entry to attach to
>>>>>>> the &drm_gpuvms evict
>>>>>>> +                        * list.
>>>>>>> +                        */
>>>>>>> +                       struct list_head evict;
>>>>>>>                  } entry;
>>>>>>>          } list;
>>>>>>>     };
>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>                    struct drm_gem_object *obj);
>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>> evict);
>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>> +
>>>>>>>     /**
>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>> list of &drm_gpuva
>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>> iteration step
>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>           * used.
>>>>>>>           */
>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>> *priv);
>>>>>>> +
>>>>>>> +       /**
>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>> +        *
>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>> &drm_gem_object being
>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>> +        *
>>>>>>> +        * Typically, drivers would call their driver
>>>>>>> specific variant of
>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>> +        */
>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>     };
>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
Thomas Hellstrom Sept. 13, 2023, 2:29 p.m. UTC | #22
On 9/13/23 16:01, Boris Brezillon wrote:
> On Wed, 13 Sep 2023 15:22:56 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> On 9/13/23 13:33, Boris Brezillon wrote:
>>> On Wed, 13 Sep 2023 12:39:01 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>   
>>>> Hi,
>>>>
>>>> On 9/13/23 09:19, Boris Brezillon wrote:
>>>>> On Wed, 13 Sep 2023 17:05:42 +1000
>>>>> Dave Airlie <airlied@gmail.com> wrote:
>>>>>      
>>>>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
>>>>>> <boris.brezillon@collabora.com> wrote:
>>>>>>> On Tue, 12 Sep 2023 18:20:32 +0200
>>>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>>>         
>>>>>>>>> +/**
>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
>>>>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>> + *
>>>>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
>>>>>>>>> + * iterator releases the lock immediately after picking the first element from
>>>>>>>>> + * the list, so list insertion deletion can happen concurrently.
>>>>>>>> Are the list spinlocks needed for that async state update from within
>>>>>>>> the dma-fence critical section we've discussed previously?
>>>>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
>>>>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
>>>>>>> get that Xe and Nouveau don't need that because they update the VM
>>>>>>> state early (in the ioctl path), but I keep thinking this will hurt us
>>>>>>> if we don't think it through from the beginning, because once you've
>>>>>>> set this logic to depend only on resv locks, it will be pretty hard to
>>>>>>> get back to a solution which lets synchronous VM_BINDs take precedence
>>>>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
>>>>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
>>>>>>> take a long time to get your synchronous VM_BIND executed...
>>>> So this would boil down to either (possibly opt-in) keeping the spinlock
>>>> approach or pushing the unlink out to a wq then?
>>> Deferred _unlink() would not be an issue, since I already defer the
>>> drm_gpuva destruction to a wq, it would just a be a matter of moving the
>>> _unlink() call there as well. But _link() also takes the GEM gpuva list
>>> lock, and that one is bit tricky, in that sm_map() can trigger 2 more
>>> _link() calls for the prev/next mappings, which we can't guess until we
>>> get to execute the VM update. If we mandate the use of the GEM resv
>>> lock, that simply means async VM updates (AKA calling
>>> drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
>>> agrees on, then I'd like the APIs that make this sort of async VM
>>> update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
>>> methods, and probably other things) to be dropped, so we don't make it
>>> look like it's something we support.
>>>   
>>>> BTW, as also asked in a reply to Danilo, how do you call unlink from
>>>> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?
>>> _unlink() makes sure the GEM gpuva list lock is taken, but this can be
>>> a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
>>> panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
>>> protection. We make sure we never take this lock while allocating
>>> memory to guarantee the dma-signalling path can't deadlock.
>>>   
>>>>>>>         
>>>>>> btw what is the use case for this? do we have actual vulkan
>>>>>> applications we know will have problems here?
>>>>> I don't, but I think that's a concern Faith raised at some point (dates
>>>>> back from when I was reading threads describing how VM_BIND on i915
>>>>> should work, and I was clearly discovering this whole VM_BIND thing at
>>>>> that time, so maybe I misunderstood).
>>>>>      
>>>>>> it feels like a bit of premature optimisation, but maybe we have use cases.
>>>>> Might be, but that's the sort of thing that would put us in a corner if
>>>>> we don't have a plan for when the needs arise. Besides, if we don't
>>>>> want to support that case because it's too complicated, I'd recommend
>>>>> dropping all the drm_gpuvm APIs that let people think this mode is
>>>>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
>>>>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
>>>>> confusion.
>>>> Xe allows bypassing the bind-queue with another bind-queue, but to
>>>> completely avoid dependencies between queues the Operations may not
>>>> overlap.
>>> So, you check the VM state with some VM lock held (would be the VM resv
>>> in my case), and if the mapping is new (no overlaps with pre-existing
>>> mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
>>> be missing I guess is a way to know if the mapping is active (MMU has
>>> been updated) or pending (MMU update queued to the bind-queue), so I can
>>> fast-track mapping/unmapping of active mappings. This would leave
>>> overlapping sync/async VM updates, which can't happen in practice
>>> unless userspace is doing something wrong (sparse bindings always go
>>> through vkQueueBindSparse).
>> User-space is allowed to create new bind queues at will, and they
>> execute independently save for range overlaps.
> I've limited panthor to just one bind-queue that's automatically
> created when the VM is created. I guess letting userspace create more
> than one queue is doable, but we'd still be serializing VM
> operations anyway and that complicates the whole thing when concurrent
> operations to the same VM region happen from different bind queues, so I
> figured it'd be simpler to expose just one queue.
>
>> And the overlapping granularity depends very much on the detail of the
>> range tracking.
>> We drafted this fenced range utility
>>
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/353
>>
>> That tracks active ranges that remove themselves when the attached fence
>> signals. Not sure if we ended up using it, though. A new binding would
>> scan this utility for dma-fences it needs to depend upon.
> Sounds like implicit deps on VM ranges :D. I'll have a look, thanks
> for the pointer!
>
>> Ranges in Xe
>> are actually page-table modification ranges, so can exceed the actual VA
>> range in some situations, but if you can build page-table structures
>> async the granularity indeed becomes better.
> The granularity in Mali is 4k, and we don't build the page table struct
> asynchronously, we just update the page table tree from the CPU,
> holding a VM lock to serialize such operations (that's done
> synchronously in the ::run_job() path, or from the ioctl in case of a
> sync-VM_BIND).

OK, yeah we have something similar although we build the page-table tree 
in the IOCTL and update entries using the GPU unless there are no 
dependencies, in which case we do it sync in the ioctl as well.

The drawback here is that if one op adds a pagetable tree node near the 
root (spanning say 1G) and the next op adds an entry to that node, the 
granularity can become pretty large...

/Thomas


>
>> /Thomas
>>
>>
>>
>>> I'll give it a try.
>>>   
>>>> (And the definition of overlap is currently page-table
>>>> structure updates may not overlap) but no guarantees are made about
>>>> priority.
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>
Thomas Hellstrom Sept. 13, 2023, 3:13 p.m. UTC | #23
Hi Christian

On 9/13/23 16:26, Christian König wrote:
> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>> As mentioned in a different mail thread, the reply is based on the 
>> assumption
>> that we don't support anything else than GPUVM updates from the IOCTL.
>
> I think that this assumption is incorrect.
>
> Vulkan is just once specific use case, but this here should probably 
> be able to handle other use cases as well.
>
> Especially with HMM you get the requirement that you need to be able 
> to invalidate GPUVM mappings without grabbing a reservation lock.

Are you referring to the MMU range invalidation notifiers here?

>
> See what the eviction lock in amdgpu is doing for example.

IMO the statement regarding GPUVM updates from the IOCTL mostly refers 
to the need to protect the evicted- and extobj lists with additional 
spinlocks. Supporting userptr and faulting will ofc require additional 
locks / locking mechanisms. But this code doesn't do that yet. Is your 
concern that these particular spinlocks for these lists are indeed needed?

/Thomas


>
> Regards,
> Christian.
>
>>
>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>> Hi!
>>>
>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>>>>>>> Hi, Danilo,
>>>>>>>
>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>> track GPU VA
>>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>>> to their
>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>> on the GPU VA
>>>>>>>> space.
>>>>>>>>
>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>> drivers, which
>>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>>> manager
>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>> this patch aims
>>>>>>>> at generalizing the following elements.
>>>>>>>>
>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>> outside of
>>>>>>>>       this GPU-VM.
>>>>>>>>
>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>> which are
>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>
>>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>>> resv the
>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>
>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>> contains mappings
>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>> accelerated.
>>>>>>>>
>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>
>>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>>> make all
>>>>>>>> features appear as a collection of optional helper functions,
>>>>>>>> such that
>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>> any feature
>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>
>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>> locking for drivers
>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>
>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>>> instance of this
>>>>>>>>      * particular combination. If not existent a new instance
>>>>>>>> is created and linked
>>>>>>>>      * to the &drm_gem_object.
>>>>>>>> + *
>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>> &drm_gpuvm, are also used
>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>> evicted objects. Those
>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>> dma-resv locks and
>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>> instance the all
>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>> locked by calling
>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>> drm_gpuvm_validate() in
>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>> also possible to lock
>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>> corresponding parameters to
>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>> loop while making
>>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>>> or
>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>> + *
>>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>>> when its &dma_resv
>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>> &dma_resv structure.
>>>>>>>>      */
>>>>>>>>     /**
>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>>> &drm_gpuvm and
>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>> creations and destructions
>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>> + *
>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>> evicted objects are
>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>> iteration internally.
>>>>>>>> + *
>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>> calls to functions
>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>>> a particular
>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>> + *
>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>> such as
>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>> called with external
>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>> corresponding list to be
>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>> other API functions.
>>>>>>>> + * However, this is entirely optional.
>>>>>>>>      */
>>>>>>>>     /**
>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>      *   }
>>>>>>>>      */
>>>>>>>> +/**
>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>> already iterated items
>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>> + *
>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>> Lockless as in, the
>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>> first element from
>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>> concurrently.
>>>>>>> Are the list spinlocks needed for that async state update from
>>>>>>> within the
>>>>>>> dma-fence critical section we've discussed previously?
>>>>>> Yes, but also for other reasons, see below.
>>>>>>
>>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>>> gpuvm's resv
>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>
>>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>>> could we
>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>> allows for)?
>>>>>> The evict spinlock is needed in any case, since in
>>>>>> drm_gpuvm_bo_evict() we're
>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>> called for. Hence,
>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>>> different BOs.
>>>>> No. Only if you try to add external objects to the vm's evict list
>>>>> from
>>>>> within the evict code. That's not necessary since you loop through
>>>>> all
>>>>> external objects anyway when locking them so an "evicted" bool in
>>>>> the vm_bo,
>>>>> protected by the bo resv would be sufficient. The extobj locking
>>>>> loop can
>>>>> then add the bo to the evicted list.
>>>> And validate() can remove it while still holding all dma-resv locks,
>>>> neat!
>>>> However, what if two tasks are trying to lock the VA space
>>>> concurrently? What
>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>> drm_gpuva_unlink()?
>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>>> on the
>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>>> with the
>>>> dma-resv lock held, which wouldn't be allowed, since
>>>> drm_gpuvm_bo_destroy()
>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>> potentially
>>>> free the dma-resv lock while holding it, at least if it's an external
>>>> object.
>>> Easiest way in this scheme is to think of the lists as being protected
>>> by the vm's resv lock. That means anybody calling unlink() must also
>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>> perhaps not from a locking inversion POW from an async list update).
>> This would mean that on unlink() we'd need to hold the VM's resv lock 
>> and the
>> corresponding GEM's resv lock (in case they're not the same anyways) 
>> because the
>> VM's resv lock would protect the external / evicted object lists and 
>> the GEM
>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>> drm_gpuvm_bo's list of drm_gpuvas.
>>
>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>> really would not
>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>> the way in case
>>>>>> the driver already has an outer lock protecting this path.
>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>> pretty
>>>>> costly and as discussed earlier this type of locking was the reason
>>>>> (at
>>>>> least according to the commit message) that made Christian drop the
>>>>> XArray
>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>> is
>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>> complexity and a
>>>>> single wide lock following the drm locking guidelines set out by
>>>>> Daniel and
>>>>> David should really be the default choice with an opt-in for a
>>>>> spinlock if
>>>>> needed for async and pushing out to a wq is not an option.
>>>> For the external object list an outer lock would work as long as it's
>>>> not the
>>>> dma-resv lock of the corresponding GEM object, since here we actually
>>>> need to
>>>> remove the list entry from the external object list on
>>>> drm_gpuvm_bo_destroy().
>>>> It's just a bit weird design wise that drivers would need to take
>>>> this outer
>>>> lock on:
>>>>
>>>> - drm_gpuvm_bo_extobj_add()
>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>> - drm_gpuva_unlink()            (because it needs to call
>>>> drm_gpuvm_bo_put())
>>>> - drm_gpuvm_exec_lock()
>>>> - drm_gpuvm_exec_lock_array()
>>>> - drm_gpuvm_prepare_range()
>>>>
>>>> Given that it seems reasonable to do all the required locking
>>>> internally.
>>>  From a design POW, there has been a clear direction in XE to make
>>> things similar to mmap() / munmap(), so this outer lock, which in Xe is
>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>> the page-table structures and vma rb tree, the userptr structures and
>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>>> all of the above are just asserting that it is taken in the correct
>>> mode.
>>>
>>> But strictly with this scheme one could also use the vm's dma_resv for
>>> the extobj list since with drm_exec, it's locked before traversing the
>>> list.
>>>
>>> The whole point of this scheme is to rely on locks that you already are
>>> supposed to be holding for various reasons and is simple to comprehend.
>> I don't agree that we're supposed to hold the VM's resv lock anyways for
>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine 
>> using it
>> for that purpose nevertheless.
>>
>>>> In order to at least place lockdep checks, the driver would need to
>>>> supply the
>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>>> know about
>>>> the lock.
>>> Yes, that sounds reasonable. One lockdep map per list.
>> I'd really like to avoid that, especially now that everything got 
>> simpler. We
>> should define the actual locks to take instead.
>>
>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>>> need to
>>>> spin?
>>> I guess it's hard to tell exactly, but it is much lower on modern x86
>>> than what it used to be. Not sure about ARM, which is the other
>>> architecture important to us. I figure if there is little cache-line
>>> bouncing the main overhead comes from the implied barriers.
>>>
>>>>> A pretty simple way that would not add much code would be
>>>>>
>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>> spinlock_t
>>>>> *lock)
>>>>>
>>>>> {
>>>>>
>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>          spin_lock(lock);
>>>>>
>>>>> }
>>>>>
>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>> hold the vm's
>>>>>>> resv, though.
>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>> gpuva list (or
>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>> lock for that
>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>> otherwise wouldn't
>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>> was referring to
>>>>>> earlier.
>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>> list, but
>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>> problem. We
>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>> but we
>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>> calls to
>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>> Drivers calling unlink() from the fence signaling path can't use the
>>>> VM's
>>>> dma-resv lock.
>>> Yes, that made me a bit curious because in the current version the code
>>> required the object's dma_resv for unlink() which can't be grabbed
>>> either from the fence signaling path. So are there any drivers actually
>>> wanting to do that? If so, they will either need to resort to the
>>> current spinlock solution or they will need to call unlink from a
>>> workqueue item.
>> As Boris already mentioned we have the dma-resv lock by default or a 
>> driver
>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>
>>>> Also, what if the object is an external object? We can't use the VM's
>>>> dma-resv
>>>> lock here.
>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>> operation where it should be trivial to grab the vm's resv. Or, for
>>> that matter any outer lock protecting the extobj list. Rule would be
>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>>> be protected by either the vm's dma_resv (or possibly an outer lock in
>>> the case of the extobj list).
>> Outer lock wouldn't have been working for updates in the async path, but
>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>
>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>> refcount drops
>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>>> drop the
>>>> last reference of the GEM object.
>>> Yes, but this is a different problem as to what exactly protects
>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
>>> Boris didn't like that, but requiring an explicit refcount for a
>>> pointer you dereference unless you're under a lock that ensures keeping
>>> the object alive is pretty much required?) But anyway for the
>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
>>> I don't have a strong preference.
>> We can keep the GEM objects dma-resv lock, however as mentioned above
>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's 
>> resv lock
>> and the GEM's resv lock in case they differ.
>>
>>>>   All those problems go away with a dedicated
>>>> GEM gpuva list lock.
>>> I don't think these are real problems.
>>> With the excepton of the eviction list "trick" where we currently have
>>> slightly different approach to collect external bos needing rebinding,
>>> we have this working fine.
>>>
>>> TBH I think pretty much the only situation where the spinlock is needed
>>> is for async updates of these lists, unless a wq item can be used for
>>> that, but it doesn't really seem like the current code allows for such
>>> updates anyway? It complicates the code a lot, adds overhead and also
>>> adds the requirement for refcounting during list traversal.
>>>
>>> /Thomas
>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>> atomic.
>>>>>>>
>>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>>> when
>>>>>>> possible".
>>>>>>> Lower level locks only when necessary for performance or
>>>>>>> locking inversion?
>>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>> + *
>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>> local list, so removal
>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>> iterating the list.
>>>>>>>> + */
>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>> +       ({
>>>>>>>>                             \
>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>> *__vm_bo;                                           \
>>>>>>>> +
>>>>>>>>                             \
>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>                             \
>>>>>>>> +
>>>>>>>>                             \
>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>> __list_name.lock);                                \
>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>> __list_name.list)) {                     \
>>>>>>>> +                       __vm_bo =
>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>> + struct
>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>> +
>>>>>>>> list.entry.__list_name);             \
>>>>>>>> +                       if
>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>> {                    \
>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name,      \
>>>>>>>> +
>>>>>>>> __local_list);                           \
>>>>>>>> +                               break;
>>>>>>>>                             \
>>>>>>>> +                       } else
>>>>>>>> {                                                        \
>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name);      \
>>>>>>>> +                               __vm_bo =
>>>>>>>> NULL;                                         \
>>>>>>>> +                       }
>>>>>>>>                             \
>>>>>>>> +               }
>>>>>>>>                             \
>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>> __list_name.lock);                              \
>>>>>>>> +
>>>>>>>>                             \
>>>>>>>> +               __vm_bo;
>>>>>>>>                             \
>>>>>>>> +       })
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>> + *
>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>> Lockless as in, the
>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>> first element from the
>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>> concurrently.
>>>>>>>> + *
>>>>>>>> + * Typical use:
>>>>>>>> + *
>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>> + *
>>>>>>>> + *     ret = 0;
>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>> + *             if (ret)
>>>>>>>> + *                     break;
>>>>>>>> + *     }
>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>> &my_local_list);
>>>>>>>> + *
>>>>>>>> + *
>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>> exposed to the outside
>>>>>>>> + * world.
>>>>>>>> + */
>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>> __list_name,           \
>>>>>>>> +                                               __local_list,
>>>>>>>> NULL);            \
>>>>>>>> +
>>>>>>>> __vm_bo;
>>>>>>>>        \
>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>> __list_name,           \
>>>>>>>> +                                               __local_list,
>>>>>>>> __vm_bo))         \
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>> original list
>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>> already iterated items
>>>>>>>> + *
>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>> restore_vm_bo_list()
>>>>>>>> + * to restore the original state and let new iterations take
>>>>>>>> place.
>>>>>>>> + */
>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>> __local_list)                         \
>>>>>>>> +       do
>>>>>>>> {
>>>>>>>>                  \
>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>> list elements to the          \
>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>> case it matters.              \
>>>>>>>> +
>>>>>>>> */
>>>>>>>>            \
>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>> __list_name.lock);                                \
>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>> __list_name.list);                \
>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>> __list_name.lock);                              \
>>>>>>>> +       } while (0)
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>> list
>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>> + *
>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>> @__list_name and
>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>> + */
>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>> __list_name)                            \
>>>>>>>> +       do
>>>>>>>> {
>>>>>>>>          \
>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>> __list_name.lock);                    \
>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name))             \
>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name,       \
>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>> __list_name.list);        \
>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>> __list_name.lock);                  \
>>>>>>>> +       } while (0)
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>> list
>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>> + *
>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>> @__list_name and
>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>> + */
>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>> __list_name)                            \
>>>>>>>> +       do
>>>>>>>> {
>>>>>>>>          \
>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>> __list_name.lock);                    \
>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name))            \
>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name);      \
>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>> __list_name.lock);                  \
>>>>>>>> +       } while (0)
>>>>>>>> +
>>>>>>>> +static int __must_check
>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>> +
>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>> struct drm_device *drm,
>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>> +
>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>> +
>>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>> *gpuvm)
>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>> memory.\n");
>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>> should be empty.\n");
>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>> should be empty.\n");
>>>>>>>> +
>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + *
>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>>> given
>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>> + *
>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>> responsibility to call
>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>> + *
>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>> and removal of
>>>>>>>> + * external objects, however it is not safe against
>>>>>>>> concurrent usage itself.
>>>>>>>> + *
>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>> either an outer VM lock
>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>>> within the
>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>> dma-resv lock ensures
>>>>>>>> + * mutual exclusion.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>> +                         unsigned int num_fences)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>> +       int ret = 0;
>>>>>>>> +
>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>> vm_bo) {
>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>> num_fences);
>>>>>>>> +               if (ret)
>>>>>>>> +                       break;
>>>>>>>> +       }
>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>> +
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>>> a given range
>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + *
>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>> mapped between @addr
>>>>>>>> + * and @addr + @range.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>> drm_exec *exec,
>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>> num_fences)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>> +       u64 end = addr + range;
>>>>>>>> +       int ret;
>>>>>>>> +
>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>> +
>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>> num_fences);
>>>>>>>> +               if (ret)
>>>>>>>> +                       return ret;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       return 0;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>> assoiciated BOs
>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>> + *
>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>> given
>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>> + *
>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>> lock additional
>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>> Typically, drivers
>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>> callback.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                   unsigned int num_fences,
>>>>>>>> +                   bool interruptible)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>> +       uint32_t flags;
>>>>>>>> +       int ret;
>>>>>>>> +
>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>> 0 |
>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>> +
>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>> +
>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>> num_fences);
>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>> +               if (ret)
>>>>>>>> +                       goto err;
>>>>>>>> +
>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>> num_fences);
>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>> +               if (ret)
>>>>>>>> +                       goto err;
>>>>>>>> +
>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>> num_fences);
>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>> +                       if (ret)
>>>>>>>> +                               goto err;
>>>>>>>> +               }
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       return 0;
>>>>>>>> +
>>>>>>>> +err:
>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>> num_fences)
>>>>>>>> +{
>>>>>>>> +       struct {
>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>> +               unsigned int num_objs;
>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>> +
>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>> objs,
>>>>>>>> + args->num_objs,
>>>>>>>> num_fences);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>> assoiciated BOs
>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>> lock
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>> + *
>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>> given &drm_gpuvm
>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>> +                         unsigned int num_objs,
>>>>>>>> +                         unsigned int num_fences,
>>>>>>>> +                         bool interruptible)
>>>>>>>> +{
>>>>>>>> +       struct {
>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>> +               unsigned int num_objs;
>>>>>>>> +       } args;
>>>>>>>> +
>>>>>>>> +       args.objs = objs;
>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>> +
>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>> +
>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>> interruptible);
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>> within a given range
>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>> + *
>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>> mapped between @addr and
>>>>>>>> + * @addr + @range.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>> +                         unsigned int num_fences,
>>>>>>>> +                         bool interruptible)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>> +       uint32_t flags;
>>>>>>>> +       int ret;
>>>>>>>> +
>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>> 0 |
>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>> +
>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>> +
>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>> addr, range,
>>>>>>>> + num_fences);
>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>> +               if (ret)
>>>>>>>> +                       goto err;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       return ret;
>>>>>>>> +
>>>>>>>> +err:
>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>> + *
>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>> evicted buffer
>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>> +{
>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>> +       int ret = 0;
>>>>>>>> +
>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>> +               return -ENOTSUPP;
>>>>>>>> +
>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>> +               if (ret)
>>>>>>>> +                       break;
>>>>>>>> +       }
>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>> +
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>> extobj
>>>>>>>> + * dma-resv
>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>> + * @fence: fence to add
>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>> + */
>>>>>>>> +void
>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>> +{
>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>> +       unsigned long index;
>>>>>>>> +
>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>> obj) ?
>>>>>>>> +                                  private_usage :
>>>>>>>> extobj_usage);
>>>>>>>> +       }
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>> +
>>>>>>>>     /**
>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>>> drm_gpuvm_bo
>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>> *gpuvm,
>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>> +
>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>          return vm_bo;
>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>> +
>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>> +
>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>      *
>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>> + *
>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>> destroyed, which
>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>> a call to this
>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>> the caller must
>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>      */
>>>>>>>>     void
>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>> *vm_bo)
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>> +static int __must_check
>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>> +{
>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>>> &drm_gpuvm's
>>>>>>>> + * extobj list
>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>> extobj list.
>>>>>>>> + *
>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>> not on the list
>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>> external object,
>>>>>>>> + * actually.
>>>>>>>> + */
>>>>>>>> +void
>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>> +
>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>> / from a
>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>> + *
>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>> &drm_gpuvms evicted
>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>> + */
>>>>>>>> +void
>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>> +
>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>> +               if (evict)
>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>> +               else
>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>> +       }
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>> +
>>>>>>>>     static int
>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>      */
>>>>>>>>     #include <linux/list.h>
>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>     #include <linux/types.h>
>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>     struct drm_gpuvm;
>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>           * space
>>>>>>>>           */
>>>>>>>>          struct dma_resv *resv;
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>> +        */
>>>>>>>> +       struct {
>>>>>>>> +               /**
>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>> serving as
>>>>>>>> +                * external object
>>>>>>>> +                */
>>>>>>>> +               struct list_head list;
>>>>>>>> +
>>>>>>>> +               /**
>>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>>> +                */
>>>>>>>> +               spinlock_t lock;
>>>>>>>> +       } extobj;
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>>> list lock
>>>>>>>> +        */
>>>>>>>> +       struct {
>>>>>>>> +               /**
>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>> currently being
>>>>>>>> +                * evicted
>>>>>>>> +                */
>>>>>>>> +               struct list_head list;
>>>>>>>> +
>>>>>>>> +               /**
>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>> +                */
>>>>>>>> +               spinlock_t lock;
>>>>>>>> +       } evict;
>>>>>>>>     };
>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>> drm_device *drm,
>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>> &drm_gem_object is an
>>>>>>>> + * external object
>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>> + *
>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>> from the
>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>> + */
>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>> *gpuvm,
>>>>>>>> +                                      struct drm_gem_object
>>>>>>>> *obj)
>>>>>>>> +{
>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>     {
>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>>> \
>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>> rb.list, rb.entry)
>>>>>>>> +/**
>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>> &drm_exec
>>>>>>>> + *
>>>>>>>> + * This structure should be created on the stack as
>>>>>>>> &drm_exec should be.
>>>>>>>> + *
>>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>>> &drm_gem_objects.
>>>>>>>> + */
>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>> +       /**
>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>> +        */
>>>>>>>> +       struct drm_exec exec;
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>> +        */
>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>> for the driver to
>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>> +        */
>>>>>>>> +       struct {
>>>>>>>> +               /**
>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>> additional &drm_gem_objects.
>>>>>>>> +                */
>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                         unsigned int num_fences);
>>>>>>>> +
>>>>>>>> +               /**
>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>> callback
>>>>>>>> +                */
>>>>>>>> +               void *priv;
>>>>>>>> +       } extra;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>> resv
>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + *
>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>> &drm_gem_object.
>>>>>>>> + *
>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>> responsibility to call
>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +static inline int
>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>> +                    unsigned int num_fences)
>>>>>>>> +{
>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>> num_fences);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>> +                             unsigned int num_fences);
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>> +                           unsigned int num_fences);
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                       unsigned int num_fences,
>>>>>>>> +                       bool interruptible);
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>> *vm_exec,
>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>> +                             unsigned int num_objs,
>>>>>>>> +                             unsigned int num_fences,
>>>>>>>> +                             bool interruptible);
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>> *vm_exec,
>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>> +                             unsigned int num_fences,
>>>>>>>> +                             bool interruptible);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>> BOs
>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>> + *
>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>> previously acquired
>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +static inline void
>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>> +{
>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>> +                             enum dma_resv_usage
>>>>>>>> private_usage,
>>>>>>>> +                             enum dma_resv_usage
>>>>>>>> extobj_usage);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>> + * @fence: fence to add
>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>> + *
>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>> + */
>>>>>>>> +static inline void
>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>> *vm_exec,
>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>> +                             enum dma_resv_usage
>>>>>>>> private_usage,
>>>>>>>> +                             enum dma_resv_usage
>>>>>>>> extobj_usage)
>>>>>>>> +{
>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>>> fence,
>>>>>>>> +                                private_usage,
>>>>>>>> extobj_usage);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     /**
>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>> &drm_gpuvm and
>>>>>>>>      * &drm_gem_object combination
>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>                           * gpuva list.
>>>>>>>>                           */
>>>>>>>>                          struct list_head gem;
>>>>>>>> +
>>>>>>>> +                       /**
>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>> the &drm_gpuvms
>>>>>>>> +                        * extobj list.
>>>>>>>> +                        */
>>>>>>>> +                       struct list_head extobj;
>>>>>>>> +
>>>>>>>> +                       /**
>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>> the &drm_gpuvms evict
>>>>>>>> +                        * list.
>>>>>>>> +                        */
>>>>>>>> +                       struct list_head evict;
>>>>>>>>                  } entry;
>>>>>>>>          } list;
>>>>>>>>     };
>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>> evict);
>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>> +
>>>>>>>>     /**
>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>> list of &drm_gpuva
>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>> iteration step
>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>           * used.
>>>>>>>>           */
>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>> *priv);
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>> +        *
>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>> &drm_gem_object being
>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>> +        *
>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>> specific variant of
>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>> +        */
>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>     };
>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>
Danilo Krummrich Sept. 13, 2023, 3:15 p.m. UTC | #24
On 9/13/23 16:26, Christian König wrote:
> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>> As mentioned in a different mail thread, the reply is based on the assumption
>> that we don't support anything else than GPUVM updates from the IOCTL.
> 
> I think that this assumption is incorrect.

Well, more precisely I should have said "don't support GPUVM updated from within
fence signaling critical sections". And looking at the code, that doesn't seem what
you're doing there.

> 
> Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well.
> 
> Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock.

What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict()
should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv
lock there.

> 
> See what the eviction lock in amdgpu is doing for example.

The eviction_lock seems to protect a VM state "evicting" of whether any BO that
is associated with the VM is currently evicting. At the same time amdgpu protects
the eviceted list of the VM with a different lock. So this seems to be entirely
unrelated. Tracking a "currently evicting" state is not part of the GPUVM
implementation currently and hence nothing would change for amdgpu there.

> 
> Regards,
> Christian.
> 
>>
>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>> Hi!
>>>
>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>>>>>>> Hi, Danilo,
>>>>>>>
>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>> track GPU VA
>>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>>> to their
>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>> on the GPU VA
>>>>>>>> space.
>>>>>>>>
>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>> drivers, which
>>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>>> manager
>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>> this patch aims
>>>>>>>> at generalizing the following elements.
>>>>>>>>
>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>> outside of
>>>>>>>>       this GPU-VM.
>>>>>>>>
>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>> which are
>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>
>>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>>> resv the
>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>
>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>> contains mappings
>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>> accelerated.
>>>>>>>>
>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>
>>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>>> make all
>>>>>>>> features appear as a collection of optional helper functions,
>>>>>>>> such that
>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>> any feature
>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>
>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>> locking for drivers
>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>
>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>>> instance of this
>>>>>>>>      * particular combination. If not existent a new instance
>>>>>>>> is created and linked
>>>>>>>>      * to the &drm_gem_object.
>>>>>>>> + *
>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>> &drm_gpuvm, are also used
>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>> evicted objects. Those
>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>> dma-resv locks and
>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>> instance the all
>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>> locked by calling
>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>> drm_gpuvm_validate() in
>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>> also possible to lock
>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>> corresponding parameters to
>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>> loop while making
>>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>>> or
>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>> + *
>>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>>> when its &dma_resv
>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>> &dma_resv structure.
>>>>>>>>      */
>>>>>>>>     /**
>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>>> &drm_gpuvm and
>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>> creations and destructions
>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>> + *
>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>> evicted objects are
>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>> iteration internally.
>>>>>>>> + *
>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>> calls to functions
>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>>> a particular
>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>> + *
>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>> such as
>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>> called with external
>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>> corresponding list to be
>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>> other API functions.
>>>>>>>> + * However, this is entirely optional.
>>>>>>>>      */
>>>>>>>>     /**
>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>      *   }
>>>>>>>>      */
>>>>>>>> +/**
>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>> already iterated items
>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>> + *
>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>> Lockless as in, the
>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>> first element from
>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>> concurrently.
>>>>>>> Are the list spinlocks needed for that async state update from
>>>>>>> within the
>>>>>>> dma-fence critical section we've discussed previously?
>>>>>> Yes, but also for other reasons, see below.
>>>>>>
>>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>>> gpuvm's resv
>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>
>>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>>> could we
>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>> allows for)?
>>>>>> The evict spinlock is needed in any case, since in
>>>>>> drm_gpuvm_bo_evict() we're
>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>> called for. Hence,
>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>>> different BOs.
>>>>> No. Only if you try to add external objects to the vm's evict list
>>>>> from
>>>>> within the evict code. That's not necessary since you loop through
>>>>> all
>>>>> external objects anyway when locking them so an "evicted" bool in
>>>>> the vm_bo,
>>>>> protected by the bo resv would be sufficient. The extobj locking
>>>>> loop can
>>>>> then add the bo to the evicted list.
>>>> And validate() can remove it while still holding all dma-resv locks,
>>>> neat!
>>>> However, what if two tasks are trying to lock the VA space
>>>> concurrently? What
>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>> drm_gpuva_unlink()?
>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>>> on the
>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>>> with the
>>>> dma-resv lock held, which wouldn't be allowed, since
>>>> drm_gpuvm_bo_destroy()
>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>> potentially
>>>> free the dma-resv lock while holding it, at least if it's an external
>>>> object.
>>> Easiest way in this scheme is to think of the lists as being protected
>>> by the vm's resv lock. That means anybody calling unlink() must also
>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>> perhaps not from a locking inversion POW from an async list update).
>> This would mean that on unlink() we'd need to hold the VM's resv lock and the
>> corresponding GEM's resv lock (in case they're not the same anyways) because the
>> VM's resv lock would protect the external / evicted object lists and the GEM
>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>> drm_gpuvm_bo's list of drm_gpuvas.
>>
>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>> really would not
>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>> the way in case
>>>>>> the driver already has an outer lock protecting this path.
>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>> pretty
>>>>> costly and as discussed earlier this type of locking was the reason
>>>>> (at
>>>>> least according to the commit message) that made Christian drop the
>>>>> XArray
>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>> is
>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>> complexity and a
>>>>> single wide lock following the drm locking guidelines set out by
>>>>> Daniel and
>>>>> David should really be the default choice with an opt-in for a
>>>>> spinlock if
>>>>> needed for async and pushing out to a wq is not an option.
>>>> For the external object list an outer lock would work as long as it's
>>>> not the
>>>> dma-resv lock of the corresponding GEM object, since here we actually
>>>> need to
>>>> remove the list entry from the external object list on
>>>> drm_gpuvm_bo_destroy().
>>>> It's just a bit weird design wise that drivers would need to take
>>>> this outer
>>>> lock on:
>>>>
>>>> - drm_gpuvm_bo_extobj_add()
>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>> - drm_gpuva_unlink()            (because it needs to call
>>>> drm_gpuvm_bo_put())
>>>> - drm_gpuvm_exec_lock()
>>>> - drm_gpuvm_exec_lock_array()
>>>> - drm_gpuvm_prepare_range()
>>>>
>>>> Given that it seems reasonable to do all the required locking
>>>> internally.
>>>  From a design POW, there has been a clear direction in XE to make
>>> things similar to mmap() / munmap(), so this outer lock, which in Xe is
>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>> the page-table structures and vma rb tree, the userptr structures and
>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>>> all of the above are just asserting that it is taken in the correct
>>> mode.
>>>
>>> But strictly with this scheme one could also use the vm's dma_resv for
>>> the extobj list since with drm_exec, it's locked before traversing the
>>> list.
>>>
>>> The whole point of this scheme is to rely on locks that you already are
>>> supposed to be holding for various reasons and is simple to comprehend.
>> I don't agree that we're supposed to hold the VM's resv lock anyways for
>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine using it
>> for that purpose nevertheless.
>>
>>>> In order to at least place lockdep checks, the driver would need to
>>>> supply the
>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>>> know about
>>>> the lock.
>>> Yes, that sounds reasonable. One lockdep map per list.
>> I'd really like to avoid that, especially now that everything got simpler. We
>> should define the actual locks to take instead.
>>
>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>>> need to
>>>> spin?
>>> I guess it's hard to tell exactly, but it is much lower on modern x86
>>> than what it used to be. Not sure about ARM, which is the other
>>> architecture important to us. I figure if there is little cache-line
>>> bouncing the main overhead comes from the implied barriers.
>>>
>>>>> A pretty simple way that would not add much code would be
>>>>>
>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>> spinlock_t
>>>>> *lock)
>>>>>
>>>>> {
>>>>>
>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>          spin_lock(lock);
>>>>>
>>>>> }
>>>>>
>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>> hold the vm's
>>>>>>> resv, though.
>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>> gpuva list (or
>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>> lock for that
>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>> otherwise wouldn't
>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>> was referring to
>>>>>> earlier.
>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>> list, but
>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>> problem. We
>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>> but we
>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>> calls to
>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>> Drivers calling unlink() from the fence signaling path can't use the
>>>> VM's
>>>> dma-resv lock.
>>> Yes, that made me a bit curious because in the current version the code
>>> required the object's dma_resv for unlink() which can't be grabbed
>>> either from the fence signaling path. So are there any drivers actually
>>> wanting to do that? If so, they will either need to resort to the
>>> current spinlock solution or they will need to call unlink from a
>>> workqueue item.
>> As Boris already mentioned we have the dma-resv lock by default or a driver
>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>
>>>> Also, what if the object is an external object? We can't use the VM's
>>>> dma-resv
>>>> lock here.
>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>> operation where it should be trivial to grab the vm's resv. Or, for
>>> that matter any outer lock protecting the extobj list. Rule would be
>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>>> be protected by either the vm's dma_resv (or possibly an outer lock in
>>> the case of the extobj list).
>> Outer lock wouldn't have been working for updates in the async path, but
>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>
>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>> refcount drops
>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>>> drop the
>>>> last reference of the GEM object.
>>> Yes, but this is a different problem as to what exactly protects
>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
>>> Boris didn't like that, but requiring an explicit refcount for a
>>> pointer you dereference unless you're under a lock that ensures keeping
>>> the object alive is pretty much required?) But anyway for the
>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
>>> I don't have a strong preference.
>> We can keep the GEM objects dma-resv lock, however as mentioned above
>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's resv lock
>> and the GEM's resv lock in case they differ.
>>
>>>>   All those problems go away with a dedicated
>>>> GEM gpuva list lock.
>>> I don't think these are real problems.
>>> With the excepton of the eviction list "trick" where we currently have
>>> slightly different approach to collect external bos needing rebinding,
>>> we have this working fine.
>>>
>>> TBH I think pretty much the only situation where the spinlock is needed
>>> is for async updates of these lists, unless a wq item can be used for
>>> that, but it doesn't really seem like the current code allows for such
>>> updates anyway? It complicates the code a lot, adds overhead and also
>>> adds the requirement for refcounting during list traversal.
>>>
>>> /Thomas
>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>> atomic.
>>>>>>>
>>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>>> when
>>>>>>> possible".
>>>>>>> Lower level locks only when necessary for performance or
>>>>>>> locking inversion?
>>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>> + *
>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>> local list, so removal
>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>> iterating the list.
>>>>>>>> + */
>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>> +       ({
>>>>>>>>                             \
>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>> *__vm_bo;                                           \
>>>>>>>> +
>>>>>>>>                             \
>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>                             \
>>>>>>>> +
>>>>>>>>                             \
>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>> __list_name.lock);                                \
>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>> __list_name.list)) {                     \
>>>>>>>> +                       __vm_bo =
>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>> +                                                  struct
>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>> +
>>>>>>>> list.entry.__list_name);             \
>>>>>>>> +                       if
>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>> {                    \
>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name,      \
>>>>>>>> +
>>>>>>>> __local_list);                           \
>>>>>>>> +                               break;
>>>>>>>>                             \
>>>>>>>> +                       } else
>>>>>>>> {                                                        \
>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name);      \
>>>>>>>> +                               __vm_bo =
>>>>>>>> NULL;                                         \
>>>>>>>> +                       }
>>>>>>>>                             \
>>>>>>>> +               }
>>>>>>>>                             \
>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>> __list_name.lock);                              \
>>>>>>>> +
>>>>>>>>                             \
>>>>>>>> +               __vm_bo;
>>>>>>>>                             \
>>>>>>>> +       })
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>> + *
>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>> Lockless as in, the
>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>> first element from the
>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>> concurrently.
>>>>>>>> + *
>>>>>>>> + * Typical use:
>>>>>>>> + *
>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>> + *
>>>>>>>> + *     ret = 0;
>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>> + *             if (ret)
>>>>>>>> + *                     break;
>>>>>>>> + *     }
>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>> &my_local_list);
>>>>>>>> + *
>>>>>>>> + *
>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>> exposed to the outside
>>>>>>>> + * world.
>>>>>>>> + */
>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>> __list_name,           \
>>>>>>>> +                                               __local_list,
>>>>>>>> NULL);            \
>>>>>>>> +
>>>>>>>> __vm_bo;
>>>>>>>>        \
>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>> __list_name,           \
>>>>>>>> +                                               __local_list,
>>>>>>>> __vm_bo))         \
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>> original list
>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>> already iterated items
>>>>>>>> + *
>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>> restore_vm_bo_list()
>>>>>>>> + * to restore the original state and let new iterations take
>>>>>>>> place.
>>>>>>>> + */
>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>> __local_list)                         \
>>>>>>>> +       do
>>>>>>>> {
>>>>>>>>                  \
>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>> list elements to the          \
>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>> case it matters.              \
>>>>>>>> +
>>>>>>>> */
>>>>>>>>            \
>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>> __list_name.lock);                                \
>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>> __list_name.list);                \
>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>> __list_name.lock);                              \
>>>>>>>> +       } while (0)
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>> list
>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>> + *
>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>> @__list_name and
>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>> + */
>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>> __list_name)                            \
>>>>>>>> +       do
>>>>>>>> {
>>>>>>>>          \
>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>> __list_name.lock);                    \
>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name))             \
>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name,       \
>>>>>>>> +                                     &(__vm_bo)->vm-
>>>>>>>>> __list_name.list);        \
>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>> __list_name.lock);                  \
>>>>>>>> +       } while (0)
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>> list
>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>> + *
>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>> @__list_name and
>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>> + */
>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>> __list_name)                            \
>>>>>>>> +       do
>>>>>>>> {
>>>>>>>>          \
>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>> __list_name.lock);                    \
>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name))            \
>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>> list.entry.__list_name);      \
>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>> __list_name.lock);                  \
>>>>>>>> +       } while (0)
>>>>>>>> +
>>>>>>>> +static int __must_check
>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>> +
>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>> struct drm_device *drm,
>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>> +
>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>> +
>>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>> *gpuvm)
>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>> memory.\n");
>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>> should be empty.\n");
>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>> should be empty.\n");
>>>>>>>> +
>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + *
>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>>> given
>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>> + *
>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>> responsibility to call
>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>> + *
>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>> and removal of
>>>>>>>> + * external objects, however it is not safe against
>>>>>>>> concurrent usage itself.
>>>>>>>> + *
>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>> either an outer VM lock
>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>>> within the
>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>> dma-resv lock ensures
>>>>>>>> + * mutual exclusion.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>> +                         unsigned int num_fences)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>> +       int ret = 0;
>>>>>>>> +
>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>> vm_bo) {
>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>> num_fences);
>>>>>>>> +               if (ret)
>>>>>>>> +                       break;
>>>>>>>> +       }
>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>> +
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>>> a given range
>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + *
>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>> mapped between @addr
>>>>>>>> + * and @addr + @range.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>> drm_exec *exec,
>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>> num_fences)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>> +       u64 end = addr + range;
>>>>>>>> +       int ret;
>>>>>>>> +
>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>> +
>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>> num_fences);
>>>>>>>> +               if (ret)
>>>>>>>> +                       return ret;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       return 0;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>> assoiciated BOs
>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>> + *
>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>> given
>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>> + *
>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>> lock additional
>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>> Typically, drivers
>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>> callback.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                   unsigned int num_fences,
>>>>>>>> +                   bool interruptible)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>> +       uint32_t flags;
>>>>>>>> +       int ret;
>>>>>>>> +
>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>> 0 |
>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>> +
>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>> +
>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>> num_fences);
>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>> +               if (ret)
>>>>>>>> +                       goto err;
>>>>>>>> +
>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>> num_fences);
>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>> +               if (ret)
>>>>>>>> +                       goto err;
>>>>>>>> +
>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>> num_fences);
>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>> +                       if (ret)
>>>>>>>> +                               goto err;
>>>>>>>> +               }
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       return 0;
>>>>>>>> +
>>>>>>>> +err:
>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>> num_fences)
>>>>>>>> +{
>>>>>>>> +       struct {
>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>> +               unsigned int num_objs;
>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>> +
>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>> objs,
>>>>>>>> +                                     args->num_objs,
>>>>>>>> num_fences);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>> assoiciated BOs
>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>> lock
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>> + *
>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>> given &drm_gpuvm
>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>> +                         unsigned int num_objs,
>>>>>>>> +                         unsigned int num_fences,
>>>>>>>> +                         bool interruptible)
>>>>>>>> +{
>>>>>>>> +       struct {
>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>> +               unsigned int num_objs;
>>>>>>>> +       } args;
>>>>>>>> +
>>>>>>>> +       args.objs = objs;
>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>> +
>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>> +
>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>> interruptible);
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>> within a given range
>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>> + *
>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>> mapped between @addr and
>>>>>>>> + * @addr + @range.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>> +                         unsigned int num_fences,
>>>>>>>> +                         bool interruptible)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>> +       uint32_t flags;
>>>>>>>> +       int ret;
>>>>>>>> +
>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>> 0 |
>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>> +
>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>> +
>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>> addr, range,
>>>>>>>> +                                             num_fences);
>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>> +               if (ret)
>>>>>>>> +                       goto err;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       return ret;
>>>>>>>> +
>>>>>>>> +err:
>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>> + *
>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>> evicted buffer
>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +int
>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>> +{
>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>> +       int ret = 0;
>>>>>>>> +
>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>> +               return -ENOTSUPP;
>>>>>>>> +
>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>> +               if (ret)
>>>>>>>> +                       break;
>>>>>>>> +       }
>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>> +
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>> extobj
>>>>>>>> + * dma-resv
>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>> + * @fence: fence to add
>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>> + */
>>>>>>>> +void
>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>> +{
>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>> +       unsigned long index;
>>>>>>>> +
>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>> +                                  drm_gpuvm_is_extobj(gpuvm,
>>>>>>>> obj) ?
>>>>>>>> +                                  private_usage :
>>>>>>>> extobj_usage);
>>>>>>>> +       }
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>> +
>>>>>>>>     /**
>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>>> drm_gpuvm_bo
>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>> *gpuvm,
>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>> +
>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>          return vm_bo;
>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>> +
>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>> +
>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>      *
>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>> + *
>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>> destroyed, which
>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>> a call to this
>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>> the caller must
>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>      */
>>>>>>>>     void
>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>> *vm_bo)
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>> +static int __must_check
>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>> +{
>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>>> &drm_gpuvm's
>>>>>>>> + * extobj list
>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>> extobj list.
>>>>>>>> + *
>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>> not on the list
>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>> external object,
>>>>>>>> + * actually.
>>>>>>>> + */
>>>>>>>> +void
>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>> +
>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>> / from a
>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>> + *
>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>> &drm_gpuvms evicted
>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>> + */
>>>>>>>> +void
>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>> +{
>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>> +
>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>> +               if (evict)
>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>> +               else
>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>> +       }
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>> +
>>>>>>>>     static int
>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>      */
>>>>>>>>     #include <linux/list.h>
>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>     #include <linux/types.h>
>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>     struct drm_gpuvm;
>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>           * space
>>>>>>>>           */
>>>>>>>>          struct dma_resv *resv;
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>> +        */
>>>>>>>> +       struct {
>>>>>>>> +               /**
>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>> serving as
>>>>>>>> +                * external object
>>>>>>>> +                */
>>>>>>>> +               struct list_head list;
>>>>>>>> +
>>>>>>>> +               /**
>>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>>> +                */
>>>>>>>> +               spinlock_t lock;
>>>>>>>> +       } extobj;
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>>> list lock
>>>>>>>> +        */
>>>>>>>> +       struct {
>>>>>>>> +               /**
>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>> currently being
>>>>>>>> +                * evicted
>>>>>>>> +                */
>>>>>>>> +               struct list_head list;
>>>>>>>> +
>>>>>>>> +               /**
>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>> +                */
>>>>>>>> +               spinlock_t lock;
>>>>>>>> +       } evict;
>>>>>>>>     };
>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>> drm_device *drm,
>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>> &drm_gem_object is an
>>>>>>>> + * external object
>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>> + *
>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>> from the
>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>> + */
>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>> *gpuvm,
>>>>>>>> +                                      struct drm_gem_object
>>>>>>>> *obj)
>>>>>>>> +{
>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>     {
>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>>> \
>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>> rb.list, rb.entry)
>>>>>>>> +/**
>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>> &drm_exec
>>>>>>>> + *
>>>>>>>> + * This structure should be created on the stack as
>>>>>>>> &drm_exec should be.
>>>>>>>> + *
>>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>>> &drm_gem_objects.
>>>>>>>> + */
>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>> +       /**
>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>> +        */
>>>>>>>> +       struct drm_exec exec;
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>> +        */
>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>> for the driver to
>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>> +        */
>>>>>>>> +       struct {
>>>>>>>> +               /**
>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>> additional &drm_gem_objects.
>>>>>>>> +                */
>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                         unsigned int num_fences);
>>>>>>>> +
>>>>>>>> +               /**
>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>> callback
>>>>>>>> +                */
>>>>>>>> +               void *priv;
>>>>>>>> +       } extra;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>> resv
>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>> + *
>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>> &drm_gem_object.
>>>>>>>> + *
>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>> responsibility to call
>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +static inline int
>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>> +                    unsigned int num_fences)
>>>>>>>> +{
>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>> num_fences);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>> +                             unsigned int num_fences);
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>> +                           unsigned int num_fences);
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>> +                       unsigned int num_fences,
>>>>>>>> +                       bool interruptible);
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>> *vm_exec,
>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>> +                             unsigned int num_objs,
>>>>>>>> +                             unsigned int num_fences,
>>>>>>>> +                             bool interruptible);
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>> *vm_exec,
>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>> +                             unsigned int num_fences,
>>>>>>>> +                             bool interruptible);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>> BOs
>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>> + *
>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>> previously acquired
>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>> + *
>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>> + */
>>>>>>>> +static inline void
>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>> +{
>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>> +                             enum dma_resv_usage
>>>>>>>> private_usage,
>>>>>>>> +                             enum dma_resv_usage
>>>>>>>> extobj_usage);
>>>>>>>> +
>>>>>>>> +/**
>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>> + * @fence: fence to add
>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>> + *
>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>> + */
>>>>>>>> +static inline void
>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>> *vm_exec,
>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>> +                             enum dma_resv_usage
>>>>>>>> private_usage,
>>>>>>>> +                             enum dma_resv_usage
>>>>>>>> extobj_usage)
>>>>>>>> +{
>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>>> fence,
>>>>>>>> +                                private_usage,
>>>>>>>> extobj_usage);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     /**
>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>> &drm_gpuvm and
>>>>>>>>      * &drm_gem_object combination
>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>                           * gpuva list.
>>>>>>>>                           */
>>>>>>>>                          struct list_head gem;
>>>>>>>> +
>>>>>>>> +                       /**
>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>> the &drm_gpuvms
>>>>>>>> +                        * extobj list.
>>>>>>>> +                        */
>>>>>>>> +                       struct list_head extobj;
>>>>>>>> +
>>>>>>>> +                       /**
>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>> the &drm_gpuvms evict
>>>>>>>> +                        * list.
>>>>>>>> +                        */
>>>>>>>> +                       struct list_head evict;
>>>>>>>>                  } entry;
>>>>>>>>          } list;
>>>>>>>>     };
>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>> evict);
>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>> +
>>>>>>>>     /**
>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>> list of &drm_gpuva
>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>> iteration step
>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>           * used.
>>>>>>>>           */
>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>> *priv);
>>>>>>>> +
>>>>>>>> +       /**
>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>> +        *
>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>> &drm_gem_object being
>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>> +        *
>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>> specific variant of
>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>> +        */
>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>     };
>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>
Boris Brezillon Sept. 13, 2023, 3:17 p.m. UTC | #25
On Wed, 13 Sep 2023 16:29:30 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> On 9/13/23 16:01, Boris Brezillon wrote:
> > On Wed, 13 Sep 2023 15:22:56 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> >> On 9/13/23 13:33, Boris Brezillon wrote:  
> >>> On Wed, 13 Sep 2023 12:39:01 +0200
> >>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>     
> >>>> Hi,
> >>>>
> >>>> On 9/13/23 09:19, Boris Brezillon wrote:  
> >>>>> On Wed, 13 Sep 2023 17:05:42 +1000
> >>>>> Dave Airlie <airlied@gmail.com> wrote:
> >>>>>        
> >>>>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
> >>>>>> <boris.brezillon@collabora.com> wrote:  
> >>>>>>> On Tue, 12 Sep 2023 18:20:32 +0200
> >>>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>>>>>           
> >>>>>>>>> +/**
> >>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
> >>>>>>>>> + * @__gpuvm: The GPU VM
> >>>>>>>>> + * @__list_name: The name of the list we're iterating on
> >>>>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
> >>>>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> >>>>>>>>> + *
> >>>>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
> >>>>>>>>> + * iterator releases the lock immediately after picking the first element from
> >>>>>>>>> + * the list, so list insertion deletion can happen concurrently.  
> >>>>>>>> Are the list spinlocks needed for that async state update from within
> >>>>>>>> the dma-fence critical section we've discussed previously?  
> >>>>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> >>>>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
> >>>>>>> get that Xe and Nouveau don't need that because they update the VM
> >>>>>>> state early (in the ioctl path), but I keep thinking this will hurt us
> >>>>>>> if we don't think it through from the beginning, because once you've
> >>>>>>> set this logic to depend only on resv locks, it will be pretty hard to
> >>>>>>> get back to a solution which lets synchronous VM_BINDs take precedence
> >>>>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
> >>>>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
> >>>>>>> take a long time to get your synchronous VM_BIND executed...  
> >>>> So this would boil down to either (possibly opt-in) keeping the spinlock
> >>>> approach or pushing the unlink out to a wq then?  
> >>> Deferred _unlink() would not be an issue, since I already defer the
> >>> drm_gpuva destruction to a wq, it would just a be a matter of moving the
> >>> _unlink() call there as well. But _link() also takes the GEM gpuva list
> >>> lock, and that one is bit tricky, in that sm_map() can trigger 2 more
> >>> _link() calls for the prev/next mappings, which we can't guess until we
> >>> get to execute the VM update. If we mandate the use of the GEM resv
> >>> lock, that simply means async VM updates (AKA calling
> >>> drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
> >>> agrees on, then I'd like the APIs that make this sort of async VM
> >>> update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
> >>> methods, and probably other things) to be dropped, so we don't make it
> >>> look like it's something we support.
> >>>     
> >>>> BTW, as also asked in a reply to Danilo, how do you call unlink from
> >>>> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?  
> >>> _unlink() makes sure the GEM gpuva list lock is taken, but this can be
> >>> a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
> >>> panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
> >>> protection. We make sure we never take this lock while allocating
> >>> memory to guarantee the dma-signalling path can't deadlock.
> >>>     
> >>>>>>>           
> >>>>>> btw what is the use case for this? do we have actual vulkan
> >>>>>> applications we know will have problems here?  
> >>>>> I don't, but I think that's a concern Faith raised at some point (dates
> >>>>> back from when I was reading threads describing how VM_BIND on i915
> >>>>> should work, and I was clearly discovering this whole VM_BIND thing at
> >>>>> that time, so maybe I misunderstood).
> >>>>>        
> >>>>>> it feels like a bit of premature optimisation, but maybe we have use cases.  
> >>>>> Might be, but that's the sort of thing that would put us in a corner if
> >>>>> we don't have a plan for when the needs arise. Besides, if we don't
> >>>>> want to support that case because it's too complicated, I'd recommend
> >>>>> dropping all the drm_gpuvm APIs that let people think this mode is
> >>>>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
> >>>>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
> >>>>> confusion.  
> >>>> Xe allows bypassing the bind-queue with another bind-queue, but to
> >>>> completely avoid dependencies between queues the Operations may not
> >>>> overlap.  
> >>> So, you check the VM state with some VM lock held (would be the VM resv
> >>> in my case), and if the mapping is new (no overlaps with pre-existing
> >>> mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
> >>> be missing I guess is a way to know if the mapping is active (MMU has
> >>> been updated) or pending (MMU update queued to the bind-queue), so I can
> >>> fast-track mapping/unmapping of active mappings. This would leave
> >>> overlapping sync/async VM updates, which can't happen in practice
> >>> unless userspace is doing something wrong (sparse bindings always go
> >>> through vkQueueBindSparse).  
> >> User-space is allowed to create new bind queues at will, and they
> >> execute independently save for range overlaps.  
> > I've limited panthor to just one bind-queue that's automatically
> > created when the VM is created. I guess letting userspace create more
> > than one queue is doable, but we'd still be serializing VM
> > operations anyway and that complicates the whole thing when concurrent
> > operations to the same VM region happen from different bind queues, so I
> > figured it'd be simpler to expose just one queue.
> >  
> >> And the overlapping granularity depends very much on the detail of the
> >> range tracking.
> >> We drafted this fenced range utility
> >>
> >> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/353
> >>
> >> That tracks active ranges that remove themselves when the attached fence
> >> signals. Not sure if we ended up using it, though. A new binding would
> >> scan this utility for dma-fences it needs to depend upon.  
> > Sounds like implicit deps on VM ranges :D. I'll have a look, thanks
> > for the pointer!
> >  
> >> Ranges in Xe
> >> are actually page-table modification ranges, so can exceed the actual VA
> >> range in some situations, but if you can build page-table structures
> >> async the granularity indeed becomes better.  
> > The granularity in Mali is 4k, and we don't build the page table struct
> > asynchronously, we just update the page table tree from the CPU,
> > holding a VM lock to serialize such operations (that's done
> > synchronously in the ::run_job() path, or from the ioctl in case of a
> > sync-VM_BIND).  
> 
> OK, yeah we have something similar although we build the page-table tree 
> in the IOCTL and update entries using the GPU unless there are no 
> dependencies,

We can't do that since we defer pgtable updates to the io-pgtable
framework, which handles the pgtable tree update (we can't pass a
pre-built version of the tree). What we did though, is extend the
framework so we're in control of the page table allocations. In order
to avoid allocations in the dma-signalling path, we pre-allocate page
tables for the range we want to map (or unmap), and then pick from these
pre-allocated pages when the iopgtable frameworks asks us to allocate a
page table. Until now, we were provisioning for the worst case scenario
(all levels of the page table tree are missing, except for the root
level, which is allocated when the io-pgtable is instantiated). With
per-range operation tracking, we could potentially avoid this
over-provisioning by checking the queue of operations touching a
specific range, and making sure unmaps don't teardown page tables if we
now they're going to be needed later on (we just want to update PTEs
in that case).

> in which case we do it sync in the ioctl as well.
> 
> The drawback here is that if one op adds a pagetable tree node near the 
> root (spanning say 1G) and the next op adds an entry to that node, the 
> granularity can become pretty large...

I know nothing about the intel GPU MMU page table format, but I guess
you're talking about adding one or more levels to the pgtable tree
because some big physically contiguous mapping is split, which indeed
might require allocating page tables and filling a lot of PTEs. This
change of granularity indeed has a cost, and avoiding repeated changes
would indeed be preferable, but it's not the end of the world for
Panthor, where we only use 4k and 2M granules (only the last level is
optional in our implementation).
Christian König Sept. 13, 2023, 3:26 p.m. UTC | #26
Am 13.09.23 um 17:13 schrieb Thomas Hellström:
> Hi Christian
>
> On 9/13/23 16:26, Christian König wrote:
>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>> As mentioned in a different mail thread, the reply is based on the 
>>> assumption
>>> that we don't support anything else than GPUVM updates from the IOCTL.
>>
>> I think that this assumption is incorrect.
>>
>> Vulkan is just once specific use case, but this here should probably 
>> be able to handle other use cases as well.
>>
>> Especially with HMM you get the requirement that you need to be able 
>> to invalidate GPUVM mappings without grabbing a reservation lock.
>
> Are you referring to the MMU range invalidation notifiers here?

Yes, but you need to ping Felix and Philip for the details.

>
>>
>> See what the eviction lock in amdgpu is doing for example.
>
> IMO the statement regarding GPUVM updates from the IOCTL mostly refers 
> to the need to protect the evicted- and extobj lists with additional 
> spinlocks. Supporting userptr and faulting will ofc require additional 
> locks / locking mechanisms. But this code doesn't do that yet. Is your 
> concern that these particular spinlocks for these lists are indeed 
> needed?

More or less yes. My main concern is that both Dave and Danilo mentioned 
that they work with the assumption that they only need to handle 
Vulkan/IOCTL based use cases.

Regards,
Christian.

>
> /Thomas
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>>> Hi!
>>>>
>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>>>>>>>> Hi, Danilo,
>>>>>>>>
>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>>> track GPU VA
>>>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>>>> to their
>>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>>> on the GPU VA
>>>>>>>>> space.
>>>>>>>>>
>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>> drivers, which
>>>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>>>> manager
>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>> this patch aims
>>>>>>>>> at generalizing the following elements.
>>>>>>>>>
>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>>> outside of
>>>>>>>>>       this GPU-VM.
>>>>>>>>>
>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>> which are
>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>
>>>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>>>> resv the
>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>
>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>> contains mappings
>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>> accelerated.
>>>>>>>>>
>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>
>>>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>>>> make all
>>>>>>>>> features appear as a collection of optional helper functions,
>>>>>>>>> such that
>>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>>> any feature
>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>
>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>> locking for drivers
>>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>>
>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>>>> instance of this
>>>>>>>>>      * particular combination. If not existent a new instance
>>>>>>>>> is created and linked
>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>> + *
>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>> evicted objects. Those
>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>> dma-resv locks and
>>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>>> instance the all
>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>>> locked by calling
>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>> also possible to lock
>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>> corresponding parameters to
>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>>> loop while making
>>>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>>>> or
>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>> + *
>>>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>>>> when its &dma_resv
>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>> &dma_resv structure.
>>>>>>>>>      */
>>>>>>>>>     /**
>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>>>> &drm_gpuvm and
>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>> creations and destructions
>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>>> + *
>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>>> evicted objects are
>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>> iteration internally.
>>>>>>>>> + *
>>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>>> calls to functions
>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>>>> a particular
>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>> + *
>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>> such as
>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>>> called with external
>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>> corresponding list to be
>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>> other API functions.
>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>      */
>>>>>>>>>     /**
>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>      *   }
>>>>>>>>>      */
>>>>>>>>> +/**
>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>> already iterated items
>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>> + *
>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>> Lockless as in, the
>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>> first element from
>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>> concurrently.
>>>>>>>> Are the list spinlocks needed for that async state update from
>>>>>>>> within the
>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>
>>>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>>>> gpuvm's resv
>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>
>>>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>>>> could we
>>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>>> allows for)?
>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>>> called for. Hence,
>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>>>> different BOs.
>>>>>> No. Only if you try to add external objects to the vm's evict list
>>>>>> from
>>>>>> within the evict code. That's not necessary since you loop through
>>>>>> all
>>>>>> external objects anyway when locking them so an "evicted" bool in
>>>>>> the vm_bo,
>>>>>> protected by the bo resv would be sufficient. The extobj locking
>>>>>> loop can
>>>>>> then add the bo to the evicted list.
>>>>> And validate() can remove it while still holding all dma-resv locks,
>>>>> neat!
>>>>> However, what if two tasks are trying to lock the VA space
>>>>> concurrently? What
>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>> drm_gpuva_unlink()?
>>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>>>> on the
>>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>>>> with the
>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>> drm_gpuvm_bo_destroy()
>>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>>> potentially
>>>>> free the dma-resv lock while holding it, at least if it's an external
>>>>> object.
>>>> Easiest way in this scheme is to think of the lists as being protected
>>>> by the vm's resv lock. That means anybody calling unlink() must also
>>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>>> perhaps not from a locking inversion POW from an async list update).
>>> This would mean that on unlink() we'd need to hold the VM's resv 
>>> lock and the
>>> corresponding GEM's resv lock (in case they're not the same anyways) 
>>> because the
>>> VM's resv lock would protect the external / evicted object lists and 
>>> the GEM
>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>
>>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>>> really would not
>>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>>> the way in case
>>>>>>> the driver already has an outer lock protecting this path.
>>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>>> pretty
>>>>>> costly and as discussed earlier this type of locking was the reason
>>>>>> (at
>>>>>> least according to the commit message) that made Christian drop the
>>>>>> XArray
>>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>>> is
>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>> complexity and a
>>>>>> single wide lock following the drm locking guidelines set out by
>>>>>> Daniel and
>>>>>> David should really be the default choice with an opt-in for a
>>>>>> spinlock if
>>>>>> needed for async and pushing out to a wq is not an option.
>>>>> For the external object list an outer lock would work as long as it's
>>>>> not the
>>>>> dma-resv lock of the corresponding GEM object, since here we actually
>>>>> need to
>>>>> remove the list entry from the external object list on
>>>>> drm_gpuvm_bo_destroy().
>>>>> It's just a bit weird design wise that drivers would need to take
>>>>> this outer
>>>>> lock on:
>>>>>
>>>>> - drm_gpuvm_bo_extobj_add()
>>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>> drm_gpuvm_bo_put())
>>>>> - drm_gpuvm_exec_lock()
>>>>> - drm_gpuvm_exec_lock_array()
>>>>> - drm_gpuvm_prepare_range()
>>>>>
>>>>> Given that it seems reasonable to do all the required locking
>>>>> internally.
>>>>  From a design POW, there has been a clear direction in XE to make
>>>> things similar to mmap() / munmap(), so this outer lock, which in 
>>>> Xe is
>>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>>> the page-table structures and vma rb tree, the userptr structures and
>>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>>>> all of the above are just asserting that it is taken in the correct
>>>> mode.
>>>>
>>>> But strictly with this scheme one could also use the vm's dma_resv for
>>>> the extobj list since with drm_exec, it's locked before traversing the
>>>> list.
>>>>
>>>> The whole point of this scheme is to rely on locks that you already 
>>>> are
>>>> supposed to be holding for various reasons and is simple to 
>>>> comprehend.
>>> I don't agree that we're supposed to hold the VM's resv lock anyways 
>>> for
>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm 
>>> fine using it
>>> for that purpose nevertheless.
>>>
>>>>> In order to at least place lockdep checks, the driver would need to
>>>>> supply the
>>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>>>> know about
>>>>> the lock.
>>>> Yes, that sounds reasonable. One lockdep map per list.
>>> I'd really like to avoid that, especially now that everything got 
>>> simpler. We
>>> should define the actual locks to take instead.
>>>
>>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>>>> need to
>>>>> spin?
>>>> I guess it's hard to tell exactly, but it is much lower on modern x86
>>>> than what it used to be. Not sure about ARM, which is the other
>>>> architecture important to us. I figure if there is little cache-line
>>>> bouncing the main overhead comes from the implied barriers.
>>>>
>>>>>> A pretty simple way that would not add much code would be
>>>>>>
>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>>> spinlock_t
>>>>>> *lock)
>>>>>>
>>>>>> {
>>>>>>
>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>          spin_lock(lock);
>>>>>>
>>>>>> }
>>>>>>
>>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>>> hold the vm's
>>>>>>>> resv, though.
>>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>>> gpuva list (or
>>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>>> lock for that
>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>> otherwise wouldn't
>>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>>> was referring to
>>>>>>> earlier.
>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>>> list, but
>>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>>> problem. We
>>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>>> but we
>>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>>> calls to
>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>> Drivers calling unlink() from the fence signaling path can't use the
>>>>> VM's
>>>>> dma-resv lock.
>>>> Yes, that made me a bit curious because in the current version the 
>>>> code
>>>> required the object's dma_resv for unlink() which can't be grabbed
>>>> either from the fence signaling path. So are there any drivers 
>>>> actually
>>>> wanting to do that? If so, they will either need to resort to the
>>>> current spinlock solution or they will need to call unlink from a
>>>> workqueue item.
>>> As Boris already mentioned we have the dma-resv lock by default or a 
>>> driver
>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>>
>>>>> Also, what if the object is an external object? We can't use the VM's
>>>>> dma-resv
>>>>> lock here.
>>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>>> operation where it should be trivial to grab the vm's resv. Or, for
>>>> that matter any outer lock protecting the extobj list. Rule would be
>>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>>>> be protected by either the vm's dma_resv (or possibly an outer lock in
>>>> the case of the extobj list).
>>> Outer lock wouldn't have been working for updates in the async path, 
>>> but
>>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>>
>>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>> refcount drops
>>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>>>> drop the
>>>>> last reference of the GEM object.
>>>> Yes, but this is a different problem as to what exactly protects
>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo 
>>>> list
>>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I 
>>>> know
>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>> pointer you dereference unless you're under a lock that ensures 
>>>> keeping
>>>> the object alive is pretty much required?) But anyway for the
>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal 
>>>> spinlock)
>>>> I don't have a strong preference.
>>> We can keep the GEM objects dma-resv lock, however as mentioned above
>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the 
>>> VM's resv lock
>>> and the GEM's resv lock in case they differ.
>>>
>>>>>   All those problems go away with a dedicated
>>>>> GEM gpuva list lock.
>>>> I don't think these are real problems.
>>>> With the excepton of the eviction list "trick" where we currently have
>>>> slightly different approach to collect external bos needing rebinding,
>>>> we have this working fine.
>>>>
>>>> TBH I think pretty much the only situation where the spinlock is 
>>>> needed
>>>> is for async updates of these lists, unless a wq item can be used for
>>>> that, but it doesn't really seem like the current code allows for such
>>>> updates anyway? It complicates the code a lot, adds overhead and also
>>>> adds the requirement for refcounting during list traversal.
>>>>
>>>> /Thomas
>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
>>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>>> atomic.
>>>>>>>>
>>>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>>>> when
>>>>>>>> possible".
>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>> locking inversion?
>>>>>>>>
>>>>>>>> /Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>> + *
>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>> local list, so removal
>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>> iterating the list.
>>>>>>>>> + */
>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>> +       ({
>>>>>>>>>                             \
>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>> +
>>>>>>>>>                             \
>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>                             \
>>>>>>>>> +
>>>>>>>>>                             \
>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>> +                       __vm_bo =
>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>> + struct
>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>> +
>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>> +                       if
>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>> {                    \
>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>> +
>>>>>>>>> __local_list);                           \
>>>>>>>>> +                               break;
>>>>>>>>>                             \
>>>>>>>>> +                       } else
>>>>>>>>> {                                                        \
>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>> +                               __vm_bo =
>>>>>>>>> NULL;                                         \
>>>>>>>>> +                       }
>>>>>>>>>                             \
>>>>>>>>> +               }
>>>>>>>>>                             \
>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>> +
>>>>>>>>>                             \
>>>>>>>>> +               __vm_bo;
>>>>>>>>>                             \
>>>>>>>>> +       })
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>>> + *
>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>> Lockless as in, the
>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>> first element from the
>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>> concurrently.
>>>>>>>>> + *
>>>>>>>>> + * Typical use:
>>>>>>>>> + *
>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>> + *
>>>>>>>>> + *     ret = 0;
>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>> + *             if (ret)
>>>>>>>>> + *                     break;
>>>>>>>>> + *     }
>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>> &my_local_list);
>>>>>>>>> + *
>>>>>>>>> + *
>>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>>> exposed to the outside
>>>>>>>>> + * world.
>>>>>>>>> + */
>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>> __list_name,           \
>>>>>>>>> +                                               __local_list,
>>>>>>>>> NULL);            \
>>>>>>>>> +
>>>>>>>>> __vm_bo;
>>>>>>>>>        \
>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>> __list_name,           \
>>>>>>>>> +                                               __local_list,
>>>>>>>>> __vm_bo))         \
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>>> original list
>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>> already iterated items
>>>>>>>>> + *
>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>> restore_vm_bo_list()
>>>>>>>>> + * to restore the original state and let new iterations take
>>>>>>>>> place.
>>>>>>>>> + */
>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>> __local_list)                         \
>>>>>>>>> +       do
>>>>>>>>> {
>>>>>>>>>                  \
>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>> list elements to the          \
>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>> case it matters.              \
>>>>>>>>> +
>>>>>>>>> */
>>>>>>>>>            \
>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>> __list_name.list);                \
>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>> +       } while (0)
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>>> list
>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>> + *
>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>> @__list_name and
>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>> + */
>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>> __list_name)                            \
>>>>>>>>> +       do
>>>>>>>>> {
>>>>>>>>>          \
>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>> __list_name.list);        \
>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>> +       } while (0)
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>>> list
>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>> + *
>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>> @__list_name and
>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>> + */
>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>> __list_name)                            \
>>>>>>>>> +       do
>>>>>>>>> {
>>>>>>>>>          \
>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>> +       } while (0)
>>>>>>>>> +
>>>>>>>>> +static int __must_check
>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>> +
>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>>> struct drm_device *drm,
>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>> +
>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>> +
>>>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>> *gpuvm)
>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>>> memory.\n");
>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>>> should be empty.\n");
>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>> should be empty.\n");
>>>>>>>>> +
>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>     }
>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + *
>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>>>> given
>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>> + *
>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>> responsibility to call
>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>> + *
>>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>>> and removal of
>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>> concurrent usage itself.
>>>>>>>>> + *
>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>> either an outer VM lock
>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>>>> within the
>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>>> dma-resv lock ensures
>>>>>>>>> + * mutual exclusion.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>> +       int ret = 0;
>>>>>>>>> +
>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>> vm_bo) {
>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>>> num_fences);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       break;
>>>>>>>>> +       }
>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>> +
>>>>>>>>> +       return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>>>> a given range
>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + *
>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>> mapped between @addr
>>>>>>>>> + * and @addr + @range.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>> drm_exec *exec,
>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>> num_fences)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>> +       int ret;
>>>>>>>>> +
>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>> +
>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>> num_fences);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       return ret;
>>>>>>>>> +       }
>>>>>>>>> +
>>>>>>>>> +       return 0;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>> assoiciated BOs
>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>> + *
>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>> given
>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>> + *
>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>>> lock additional
>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>>> Typically, drivers
>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>> callback.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>> +                   bool interruptible)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>> +       uint32_t flags;
>>>>>>>>> +       int ret;
>>>>>>>>> +
>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>> 0 |
>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>> +
>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>> +
>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>> num_fences);
>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       goto err;
>>>>>>>>> +
>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>> num_fences);
>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       goto err;
>>>>>>>>> +
>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>> num_fences);
>>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>>> +                       if (ret)
>>>>>>>>> +                               goto err;
>>>>>>>>> +               }
>>>>>>>>> +       }
>>>>>>>>> +
>>>>>>>>> +       return 0;
>>>>>>>>> +
>>>>>>>>> +err:
>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>> +       return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>> +
>>>>>>>>> +static int
>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>>> num_fences)
>>>>>>>>> +{
>>>>>>>>> +       struct {
>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>> +
>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>> objs,
>>>>>>>>> + args->num_objs,
>>>>>>>>> num_fences);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>> assoiciated BOs
>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>>> lock
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>> + *
>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>> given &drm_gpuvm
>>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>> +                         bool interruptible)
>>>>>>>>> +{
>>>>>>>>> +       struct {
>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>> +       } args;
>>>>>>>>> +
>>>>>>>>> +       args.objs = objs;
>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>> +
>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>> +
>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>> interruptible);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>> within a given range
>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>> + *
>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>> mapped between @addr and
>>>>>>>>> + * @addr + @range.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>> +                         bool interruptible)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>> +       uint32_t flags;
>>>>>>>>> +       int ret;
>>>>>>>>> +
>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>> 0 |
>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>> +
>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>> +
>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>> addr, range,
>>>>>>>>> + num_fences);
>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       goto err;
>>>>>>>>> +       }
>>>>>>>>> +
>>>>>>>>> +       return ret;
>>>>>>>>> +
>>>>>>>>> +err:
>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>> +       return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>> + *
>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>> evicted buffer
>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>> +{
>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>> +       int ret = 0;
>>>>>>>>> +
>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>> +
>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       break;
>>>>>>>>> +       }
>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>> +
>>>>>>>>> +       return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>>> extobj
>>>>>>>>> + * dma-resv
>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>> + * @fence: fence to add
>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>> + */
>>>>>>>>> +void
>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>> +       unsigned long index;
>>>>>>>>> +
>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>> obj) ?
>>>>>>>>> +                                  private_usage :
>>>>>>>>> extobj_usage);
>>>>>>>>> +       }
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>> +
>>>>>>>>>     /**
>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>> *gpuvm,
>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>> +
>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>          return vm_bo;
>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>> +
>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>> +
>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>>      *
>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>> + *
>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>> destroyed, which
>>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>>> a call to this
>>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>>> the caller must
>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>      */
>>>>>>>>>     void
>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>> *vm_bo)
>>>>>>>>>     }
>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>> +static int __must_check
>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>> +{
>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>     }
>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>>>> &drm_gpuvm's
>>>>>>>>> + * extobj list
>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>>> extobj list.
>>>>>>>>> + *
>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>>> not on the list
>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>> external object,
>>>>>>>>> + * actually.
>>>>>>>>> + */
>>>>>>>>> +void
>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>> +
>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>>> / from a
>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>> + *
>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>> + */
>>>>>>>>> +void
>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>> +
>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>> +               if (evict)
>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>>> +               else
>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>>> +       }
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>> +
>>>>>>>>>     static int
>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>      */
>>>>>>>>>     #include <linux/list.h>
>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>           * space
>>>>>>>>>           */
>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>> +        */
>>>>>>>>> +       struct {
>>>>>>>>> +               /**
>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>> serving as
>>>>>>>>> +                * external object
>>>>>>>>> +                */
>>>>>>>>> +               struct list_head list;
>>>>>>>>> +
>>>>>>>>> +               /**
>>>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>>>> +                */
>>>>>>>>> +               spinlock_t lock;
>>>>>>>>> +       } extobj;
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>>>> list lock
>>>>>>>>> +        */
>>>>>>>>> +       struct {
>>>>>>>>> +               /**
>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>> currently being
>>>>>>>>> +                * evicted
>>>>>>>>> +                */
>>>>>>>>> +               struct list_head list;
>>>>>>>>> +
>>>>>>>>> +               /**
>>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>>> +                */
>>>>>>>>> +               spinlock_t lock;
>>>>>>>>> +       } evict;
>>>>>>>>>     };
>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>> drm_device *drm,
>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>> &drm_gem_object is an
>>>>>>>>> + * external object
>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>> + *
>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>> from the
>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>> + */
>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>> *gpuvm,
>>>>>>>>> +                                      struct drm_gem_object
>>>>>>>>> *obj)
>>>>>>>>> +{
>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>     {
>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>>>> \
>>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>> +/**
>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>> &drm_exec
>>>>>>>>> + *
>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>> &drm_exec should be.
>>>>>>>>> + *
>>>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>>>> &drm_gem_objects.
>>>>>>>>> + */
>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>> +       /**
>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>> +        */
>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>>> +        */
>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>> for the driver to
>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>> +        */
>>>>>>>>> +       struct {
>>>>>>>>> +               /**
>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>> +                */
>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>> +
>>>>>>>>> +               /**
>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>> callback
>>>>>>>>> +                */
>>>>>>>>> +               void *priv;
>>>>>>>>> +       } extra;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>>> resv
>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + *
>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>> &drm_gem_object.
>>>>>>>>> + *
>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>> responsibility to call
>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +static inline int
>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>> +{
>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>> num_fences);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>> +                             unsigned int num_fences);
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>> +                       bool interruptible);
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>> *vm_exec,
>>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>>> +                             unsigned int num_objs,
>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>> +                             bool interruptible);
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>> *vm_exec,
>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>> +                             bool interruptible);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>>> BOs
>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>> + *
>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>> previously acquired
>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +static inline void
>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>> +{
>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>> private_usage,
>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>> extobj_usage);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>> + * @fence: fence to add
>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>> + *
>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>> + */
>>>>>>>>> +static inline void
>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>> *vm_exec,
>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>> private_usage,
>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>> extobj_usage)
>>>>>>>>> +{
>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>>>> fence,
>>>>>>>>> +                                private_usage,
>>>>>>>>> extobj_usage);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     /**
>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>> &drm_gpuvm and
>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>                           * gpuva list.
>>>>>>>>>                           */
>>>>>>>>>                          struct list_head gem;
>>>>>>>>> +
>>>>>>>>> +                       /**
>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>> the &drm_gpuvms
>>>>>>>>> +                        * extobj list.
>>>>>>>>> +                        */
>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>> +
>>>>>>>>> +                       /**
>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>> +                        * list.
>>>>>>>>> +                        */
>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>                  } entry;
>>>>>>>>>          } list;
>>>>>>>>>     };
>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>> evict);
>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>> +
>>>>>>>>>     /**
>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>>> list of &drm_gpuva
>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>> iteration step
>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>           * used.
>>>>>>>>>           */
>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>>> *priv);
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>> +        *
>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>> &drm_gem_object being
>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>> +        *
>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>> specific variant of
>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>> +        */
>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>     };
>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>>
Christian König Sept. 13, 2023, 3:33 p.m. UTC | #27
Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
> On 9/13/23 16:26, Christian König wrote:
>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>> As mentioned in a different mail thread, the reply is based on the 
>>> assumption
>>> that we don't support anything else than GPUVM updates from the IOCTL.
>>
>> I think that this assumption is incorrect.
>
> Well, more precisely I should have said "don't support GPUVM updated 
> from within
> fence signaling critical sections". And looking at the code, that 
> doesn't seem what
> you're doing there.
>
>>
>> Vulkan is just once specific use case, but this here should probably 
>> be able to handle other use cases as well.
>>
>> Especially with HMM you get the requirement that you need to be able 
>> to invalidate GPUVM mappings without grabbing a reservation lock.
>
> What do you mean with "invalidate GPUVM mappings" in this context? 
> drm_gpuvm_bo_evict()
> should only be called from a ttm_device_funcs::move callback, we 
> should hold the dma-resv
> lock there.

Well the question is which dma-resv lock do we hold?

In the move callback we only hold the dma-resv lock of the BO which is 
moved, but when that is a shared BO then that's not the same as the one 
for the VM.

>
>>
>> See what the eviction lock in amdgpu is doing for example.
>
> The eviction_lock seems to protect a VM state "evicting" of whether 
> any BO that
> is associated with the VM is currently evicting. At the same time 
> amdgpu protects
> the eviceted list of the VM with a different lock. So this seems to be 
> entirely
> unrelated. Tracking a "currently evicting" state is not part of the GPUVM
> implementation currently and hence nothing would change for amdgpu there.

Sorry for the confusion we use different terminology in amdgpu.

The eviction lock and evicted state is for the VM page tables, e.g. if 
the whole VM is currently not used and swapped out or even de-allocated.

This is necessary because we have cases where we need to access the VM 
data without holding the dma-resv lock of this VM. Especially figuring 
out which parts of an address space contain mappings and which doesn't.

This is a requirement which comes with HMM handling, you won't see this 
with Vulkan (or OpenGL, VAAPI etc..).


The invalidation lock on the other hand is what in this discussion is 
called eviction lock. This one is needed because what I wrote above, 
during the move callback only the dma-resv of the BO which is moved is 
locked, but not necessarily the dma-resv of the VM.

Regards,
Christian.

>
>>
>> Regards,
>> Christian.
>>
>>>
>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>>> Hi!
>>>>
>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>>>>>>>> Hi, Danilo,
>>>>>>>>
>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>>> track GPU VA
>>>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>>>> to their
>>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>>> on the GPU VA
>>>>>>>>> space.
>>>>>>>>>
>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>> drivers, which
>>>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>>>> manager
>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>> this patch aims
>>>>>>>>> at generalizing the following elements.
>>>>>>>>>
>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>>> outside of
>>>>>>>>>       this GPU-VM.
>>>>>>>>>
>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>> which are
>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>
>>>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>>>> resv the
>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>
>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>> contains mappings
>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>> accelerated.
>>>>>>>>>
>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>
>>>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>>>> make all
>>>>>>>>> features appear as a collection of optional helper functions,
>>>>>>>>> such that
>>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>>> any feature
>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>
>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>> locking for drivers
>>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>>
>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>>>> instance of this
>>>>>>>>>      * particular combination. If not existent a new instance
>>>>>>>>> is created and linked
>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>> + *
>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>> evicted objects. Those
>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>> dma-resv locks and
>>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>>> instance the all
>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>>> locked by calling
>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>> also possible to lock
>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>> corresponding parameters to
>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>>> loop while making
>>>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>>>> or
>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>> + *
>>>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>>>> when its &dma_resv
>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>> &dma_resv structure.
>>>>>>>>>      */
>>>>>>>>>     /**
>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>>>> &drm_gpuvm and
>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>> creations and destructions
>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>>> + *
>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>>> evicted objects are
>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>> iteration internally.
>>>>>>>>> + *
>>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>>> calls to functions
>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>>>> a particular
>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>> + *
>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>> such as
>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>>> called with external
>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>> corresponding list to be
>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>> other API functions.
>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>      */
>>>>>>>>>     /**
>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>      *   }
>>>>>>>>>      */
>>>>>>>>> +/**
>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>> already iterated items
>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>> + *
>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>> Lockless as in, the
>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>> first element from
>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>> concurrently.
>>>>>>>> Are the list spinlocks needed for that async state update from
>>>>>>>> within the
>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>
>>>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>>>> gpuvm's resv
>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>
>>>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>>>> could we
>>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>>> allows for)?
>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>>> called for. Hence,
>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>>>> different BOs.
>>>>>> No. Only if you try to add external objects to the vm's evict list
>>>>>> from
>>>>>> within the evict code. That's not necessary since you loop through
>>>>>> all
>>>>>> external objects anyway when locking them so an "evicted" bool in
>>>>>> the vm_bo,
>>>>>> protected by the bo resv would be sufficient. The extobj locking
>>>>>> loop can
>>>>>> then add the bo to the evicted list.
>>>>> And validate() can remove it while still holding all dma-resv locks,
>>>>> neat!
>>>>> However, what if two tasks are trying to lock the VA space
>>>>> concurrently? What
>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>> drm_gpuva_unlink()?
>>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>>>> on the
>>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>>>> with the
>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>> drm_gpuvm_bo_destroy()
>>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>>> potentially
>>>>> free the dma-resv lock while holding it, at least if it's an external
>>>>> object.
>>>> Easiest way in this scheme is to think of the lists as being protected
>>>> by the vm's resv lock. That means anybody calling unlink() must also
>>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>>> perhaps not from a locking inversion POW from an async list update).
>>> This would mean that on unlink() we'd need to hold the VM's resv 
>>> lock and the
>>> corresponding GEM's resv lock (in case they're not the same anyways) 
>>> because the
>>> VM's resv lock would protect the external / evicted object lists and 
>>> the GEM
>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>
>>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>>> really would not
>>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>>> the way in case
>>>>>>> the driver already has an outer lock protecting this path.
>>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>>> pretty
>>>>>> costly and as discussed earlier this type of locking was the reason
>>>>>> (at
>>>>>> least according to the commit message) that made Christian drop the
>>>>>> XArray
>>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>>> is
>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>> complexity and a
>>>>>> single wide lock following the drm locking guidelines set out by
>>>>>> Daniel and
>>>>>> David should really be the default choice with an opt-in for a
>>>>>> spinlock if
>>>>>> needed for async and pushing out to a wq is not an option.
>>>>> For the external object list an outer lock would work as long as it's
>>>>> not the
>>>>> dma-resv lock of the corresponding GEM object, since here we actually
>>>>> need to
>>>>> remove the list entry from the external object list on
>>>>> drm_gpuvm_bo_destroy().
>>>>> It's just a bit weird design wise that drivers would need to take
>>>>> this outer
>>>>> lock on:
>>>>>
>>>>> - drm_gpuvm_bo_extobj_add()
>>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>> drm_gpuvm_bo_put())
>>>>> - drm_gpuvm_exec_lock()
>>>>> - drm_gpuvm_exec_lock_array()
>>>>> - drm_gpuvm_prepare_range()
>>>>>
>>>>> Given that it seems reasonable to do all the required locking
>>>>> internally.
>>>>  From a design POW, there has been a clear direction in XE to make
>>>> things similar to mmap() / munmap(), so this outer lock, which in 
>>>> Xe is
>>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>>> the page-table structures and vma rb tree, the userptr structures and
>>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>>>> all of the above are just asserting that it is taken in the correct
>>>> mode.
>>>>
>>>> But strictly with this scheme one could also use the vm's dma_resv for
>>>> the extobj list since with drm_exec, it's locked before traversing the
>>>> list.
>>>>
>>>> The whole point of this scheme is to rely on locks that you already 
>>>> are
>>>> supposed to be holding for various reasons and is simple to 
>>>> comprehend.
>>> I don't agree that we're supposed to hold the VM's resv lock anyways 
>>> for
>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm 
>>> fine using it
>>> for that purpose nevertheless.
>>>
>>>>> In order to at least place lockdep checks, the driver would need to
>>>>> supply the
>>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>>>> know about
>>>>> the lock.
>>>> Yes, that sounds reasonable. One lockdep map per list.
>>> I'd really like to avoid that, especially now that everything got 
>>> simpler. We
>>> should define the actual locks to take instead.
>>>
>>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>>>> need to
>>>>> spin?
>>>> I guess it's hard to tell exactly, but it is much lower on modern x86
>>>> than what it used to be. Not sure about ARM, which is the other
>>>> architecture important to us. I figure if there is little cache-line
>>>> bouncing the main overhead comes from the implied barriers.
>>>>
>>>>>> A pretty simple way that would not add much code would be
>>>>>>
>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>>> spinlock_t
>>>>>> *lock)
>>>>>>
>>>>>> {
>>>>>>
>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>          spin_lock(lock);
>>>>>>
>>>>>> }
>>>>>>
>>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>>> hold the vm's
>>>>>>>> resv, though.
>>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>>> gpuva list (or
>>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>>> lock for that
>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>> otherwise wouldn't
>>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>>> was referring to
>>>>>>> earlier.
>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>>> list, but
>>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>>> problem. We
>>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>>> but we
>>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>>> calls to
>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>> Drivers calling unlink() from the fence signaling path can't use the
>>>>> VM's
>>>>> dma-resv lock.
>>>> Yes, that made me a bit curious because in the current version the 
>>>> code
>>>> required the object's dma_resv for unlink() which can't be grabbed
>>>> either from the fence signaling path. So are there any drivers 
>>>> actually
>>>> wanting to do that? If so, they will either need to resort to the
>>>> current spinlock solution or they will need to call unlink from a
>>>> workqueue item.
>>> As Boris already mentioned we have the dma-resv lock by default or a 
>>> driver
>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>>
>>>>> Also, what if the object is an external object? We can't use the VM's
>>>>> dma-resv
>>>>> lock here.
>>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>>> operation where it should be trivial to grab the vm's resv. Or, for
>>>> that matter any outer lock protecting the extobj list. Rule would be
>>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>>>> be protected by either the vm's dma_resv (or possibly an outer lock in
>>>> the case of the extobj list).
>>> Outer lock wouldn't have been working for updates in the async path, 
>>> but
>>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>>
>>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>> refcount drops
>>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>>>> drop the
>>>>> last reference of the GEM object.
>>>> Yes, but this is a different problem as to what exactly protects
>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo 
>>>> list
>>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I 
>>>> know
>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>> pointer you dereference unless you're under a lock that ensures 
>>>> keeping
>>>> the object alive is pretty much required?) But anyway for the
>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal 
>>>> spinlock)
>>>> I don't have a strong preference.
>>> We can keep the GEM objects dma-resv lock, however as mentioned above
>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the 
>>> VM's resv lock
>>> and the GEM's resv lock in case they differ.
>>>
>>>>>   All those problems go away with a dedicated
>>>>> GEM gpuva list lock.
>>>> I don't think these are real problems.
>>>> With the excepton of the eviction list "trick" where we currently have
>>>> slightly different approach to collect external bos needing rebinding,
>>>> we have this working fine.
>>>>
>>>> TBH I think pretty much the only situation where the spinlock is 
>>>> needed
>>>> is for async updates of these lists, unless a wq item can be used for
>>>> that, but it doesn't really seem like the current code allows for such
>>>> updates anyway? It complicates the code a lot, adds overhead and also
>>>> adds the requirement for refcounting during list traversal.
>>>>
>>>> /Thomas
>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
>>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>>> atomic.
>>>>>>>>
>>>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>>>> when
>>>>>>>> possible".
>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>> locking inversion?
>>>>>>>>
>>>>>>>> /Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>> + *
>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>> local list, so removal
>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>> iterating the list.
>>>>>>>>> + */
>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>> +       ({
>>>>>>>>>                             \
>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>> +
>>>>>>>>>                             \
>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>                             \
>>>>>>>>> +
>>>>>>>>>                             \
>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>> +                       __vm_bo =
>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>> + struct
>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>> +
>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>> +                       if
>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>> {                    \
>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>> +
>>>>>>>>> __local_list);                           \
>>>>>>>>> +                               break;
>>>>>>>>>                             \
>>>>>>>>> +                       } else
>>>>>>>>> {                                                        \
>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>> +                               __vm_bo =
>>>>>>>>> NULL;                                         \
>>>>>>>>> +                       }
>>>>>>>>>                             \
>>>>>>>>> +               }
>>>>>>>>>                             \
>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>> +
>>>>>>>>>                             \
>>>>>>>>> +               __vm_bo;
>>>>>>>>>                             \
>>>>>>>>> +       })
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>>> + *
>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>> Lockless as in, the
>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>> first element from the
>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>> concurrently.
>>>>>>>>> + *
>>>>>>>>> + * Typical use:
>>>>>>>>> + *
>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>> + *
>>>>>>>>> + *     ret = 0;
>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>> + *             if (ret)
>>>>>>>>> + *                     break;
>>>>>>>>> + *     }
>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>> &my_local_list);
>>>>>>>>> + *
>>>>>>>>> + *
>>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>>> exposed to the outside
>>>>>>>>> + * world.
>>>>>>>>> + */
>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>> __list_name,           \
>>>>>>>>> +                                               __local_list,
>>>>>>>>> NULL);            \
>>>>>>>>> +
>>>>>>>>> __vm_bo;
>>>>>>>>>        \
>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>> __list_name,           \
>>>>>>>>> +                                               __local_list,
>>>>>>>>> __vm_bo))         \
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>>> original list
>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>> already iterated items
>>>>>>>>> + *
>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>> restore_vm_bo_list()
>>>>>>>>> + * to restore the original state and let new iterations take
>>>>>>>>> place.
>>>>>>>>> + */
>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>> __local_list)                         \
>>>>>>>>> +       do
>>>>>>>>> {
>>>>>>>>>                  \
>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>> list elements to the          \
>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>> case it matters.              \
>>>>>>>>> +
>>>>>>>>> */
>>>>>>>>>            \
>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>> __list_name.list);                \
>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>> +       } while (0)
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>>> list
>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>> + *
>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>> @__list_name and
>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>> + */
>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>> __list_name)                            \
>>>>>>>>> +       do
>>>>>>>>> {
>>>>>>>>>          \
>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>> __list_name.list);        \
>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>> +       } while (0)
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>>> list
>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>> + *
>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>> @__list_name and
>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>> + */
>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>> __list_name)                            \
>>>>>>>>> +       do
>>>>>>>>> {
>>>>>>>>>          \
>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>> +       } while (0)
>>>>>>>>> +
>>>>>>>>> +static int __must_check
>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>> +
>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>>> struct drm_device *drm,
>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>> +
>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>> +
>>>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>> *gpuvm)
>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>>> memory.\n");
>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>>> should be empty.\n");
>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>> should be empty.\n");
>>>>>>>>> +
>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>     }
>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + *
>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>>>> given
>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>> + *
>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>> responsibility to call
>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>> + *
>>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>>> and removal of
>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>> concurrent usage itself.
>>>>>>>>> + *
>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>> either an outer VM lock
>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>>>> within the
>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>>> dma-resv lock ensures
>>>>>>>>> + * mutual exclusion.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>> +       int ret = 0;
>>>>>>>>> +
>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>> vm_bo) {
>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>>> num_fences);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       break;
>>>>>>>>> +       }
>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>> +
>>>>>>>>> +       return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>>>> a given range
>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + *
>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>> mapped between @addr
>>>>>>>>> + * and @addr + @range.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>> drm_exec *exec,
>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>> num_fences)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>> +       int ret;
>>>>>>>>> +
>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>> +
>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>> num_fences);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       return ret;
>>>>>>>>> +       }
>>>>>>>>> +
>>>>>>>>> +       return 0;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>> assoiciated BOs
>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>> + *
>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>> given
>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>> + *
>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>>> lock additional
>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>>> Typically, drivers
>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>> callback.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>> +                   bool interruptible)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>> +       uint32_t flags;
>>>>>>>>> +       int ret;
>>>>>>>>> +
>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>> 0 |
>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>> +
>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>> +
>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>> num_fences);
>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       goto err;
>>>>>>>>> +
>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>> num_fences);
>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       goto err;
>>>>>>>>> +
>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>> num_fences);
>>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>>> +                       if (ret)
>>>>>>>>> +                               goto err;
>>>>>>>>> +               }
>>>>>>>>> +       }
>>>>>>>>> +
>>>>>>>>> +       return 0;
>>>>>>>>> +
>>>>>>>>> +err:
>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>> +       return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>> +
>>>>>>>>> +static int
>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>>> num_fences)
>>>>>>>>> +{
>>>>>>>>> +       struct {
>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>> +
>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>> objs,
>>>>>>>>> + args->num_objs,
>>>>>>>>> num_fences);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>> assoiciated BOs
>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>>> lock
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>> + *
>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>> given &drm_gpuvm
>>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>> +                         bool interruptible)
>>>>>>>>> +{
>>>>>>>>> +       struct {
>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>> +       } args;
>>>>>>>>> +
>>>>>>>>> +       args.objs = objs;
>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>> +
>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>> +
>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>> interruptible);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>> within a given range
>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>> + *
>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>> mapped between @addr and
>>>>>>>>> + * @addr + @range.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>> +                         bool interruptible)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>> +       uint32_t flags;
>>>>>>>>> +       int ret;
>>>>>>>>> +
>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>> 0 |
>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>> +
>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>> +
>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>> addr, range,
>>>>>>>>> + num_fences);
>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       goto err;
>>>>>>>>> +       }
>>>>>>>>> +
>>>>>>>>> +       return ret;
>>>>>>>>> +
>>>>>>>>> +err:
>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>> +       return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>> + *
>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>> evicted buffer
>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>> +{
>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>> +       int ret = 0;
>>>>>>>>> +
>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>> +
>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       break;
>>>>>>>>> +       }
>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>> +
>>>>>>>>> +       return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>>> extobj
>>>>>>>>> + * dma-resv
>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>> + * @fence: fence to add
>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>> + */
>>>>>>>>> +void
>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>> +       unsigned long index;
>>>>>>>>> +
>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>> obj) ?
>>>>>>>>> +                                  private_usage :
>>>>>>>>> extobj_usage);
>>>>>>>>> +       }
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>> +
>>>>>>>>>     /**
>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>> *gpuvm,
>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>> +
>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>          return vm_bo;
>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>> +
>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>> +
>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>>      *
>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>> + *
>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>> destroyed, which
>>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>>> a call to this
>>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>>> the caller must
>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>      */
>>>>>>>>>     void
>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>> *vm_bo)
>>>>>>>>>     }
>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>> +static int __must_check
>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>> +{
>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>     }
>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>>>> &drm_gpuvm's
>>>>>>>>> + * extobj list
>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>>> extobj list.
>>>>>>>>> + *
>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>>> not on the list
>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>> external object,
>>>>>>>>> + * actually.
>>>>>>>>> + */
>>>>>>>>> +void
>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>> +
>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>>> / from a
>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>> + *
>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>> + */
>>>>>>>>> +void
>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>>> +{
>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>> +
>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>> +               if (evict)
>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>>> +               else
>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>>> +       }
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>> +
>>>>>>>>>     static int
>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>      */
>>>>>>>>>     #include <linux/list.h>
>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>           * space
>>>>>>>>>           */
>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>> +        */
>>>>>>>>> +       struct {
>>>>>>>>> +               /**
>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>> serving as
>>>>>>>>> +                * external object
>>>>>>>>> +                */
>>>>>>>>> +               struct list_head list;
>>>>>>>>> +
>>>>>>>>> +               /**
>>>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>>>> +                */
>>>>>>>>> +               spinlock_t lock;
>>>>>>>>> +       } extobj;
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>>>> list lock
>>>>>>>>> +        */
>>>>>>>>> +       struct {
>>>>>>>>> +               /**
>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>> currently being
>>>>>>>>> +                * evicted
>>>>>>>>> +                */
>>>>>>>>> +               struct list_head list;
>>>>>>>>> +
>>>>>>>>> +               /**
>>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>>> +                */
>>>>>>>>> +               spinlock_t lock;
>>>>>>>>> +       } evict;
>>>>>>>>>     };
>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>> drm_device *drm,
>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>> &drm_gem_object is an
>>>>>>>>> + * external object
>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>> + *
>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>> from the
>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>> + */
>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>> *gpuvm,
>>>>>>>>> +                                      struct drm_gem_object
>>>>>>>>> *obj)
>>>>>>>>> +{
>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>     {
>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>>>> \
>>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>> +/**
>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>> &drm_exec
>>>>>>>>> + *
>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>> &drm_exec should be.
>>>>>>>>> + *
>>>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>>>> &drm_gem_objects.
>>>>>>>>> + */
>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>> +       /**
>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>> +        */
>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>>> +        */
>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>> for the driver to
>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>> +        */
>>>>>>>>> +       struct {
>>>>>>>>> +               /**
>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>> +                */
>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>> +
>>>>>>>>> +               /**
>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>> callback
>>>>>>>>> +                */
>>>>>>>>> +               void *priv;
>>>>>>>>> +       } extra;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>>> resv
>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>> + *
>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>> &drm_gem_object.
>>>>>>>>> + *
>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>> responsibility to call
>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +static inline int
>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>> +{
>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>> num_fences);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>> +                             unsigned int num_fences);
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>> +                       bool interruptible);
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>> *vm_exec,
>>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>>> +                             unsigned int num_objs,
>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>> +                             bool interruptible);
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>> *vm_exec,
>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>> +                             bool interruptible);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>>> BOs
>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>> + *
>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>> previously acquired
>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>> + *
>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +static inline void
>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>> +{
>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>> private_usage,
>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>> extobj_usage);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>> + * @fence: fence to add
>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>> + *
>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>> + */
>>>>>>>>> +static inline void
>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>> *vm_exec,
>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>> private_usage,
>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>> extobj_usage)
>>>>>>>>> +{
>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>>>> fence,
>>>>>>>>> +                                private_usage,
>>>>>>>>> extobj_usage);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     /**
>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>> &drm_gpuvm and
>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>                           * gpuva list.
>>>>>>>>>                           */
>>>>>>>>>                          struct list_head gem;
>>>>>>>>> +
>>>>>>>>> +                       /**
>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>> the &drm_gpuvms
>>>>>>>>> +                        * extobj list.
>>>>>>>>> +                        */
>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>> +
>>>>>>>>> +                       /**
>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>> +                        * list.
>>>>>>>>> +                        */
>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>                  } entry;
>>>>>>>>>          } list;
>>>>>>>>>     };
>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>> evict);
>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>> +
>>>>>>>>>     /**
>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>>> list of &drm_gpuva
>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>> iteration step
>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>           * used.
>>>>>>>>>           */
>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>>> *priv);
>>>>>>>>> +
>>>>>>>>> +       /**
>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>> +        *
>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>> &drm_gem_object being
>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>> +        *
>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>> specific variant of
>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>> +        */
>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>     };
>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>>
>
Danilo Krummrich Sept. 13, 2023, 3:46 p.m. UTC | #28
On 9/13/23 17:33, Christian König wrote:
> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>> On 9/13/23 16:26, Christian König wrote:
>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>> As mentioned in a different mail thread, the reply is based on the assumption
>>>> that we don't support anything else than GPUVM updates from the IOCTL.
>>>
>>> I think that this assumption is incorrect.
>>
>> Well, more precisely I should have said "don't support GPUVM updated from within
>> fence signaling critical sections". And looking at the code, that doesn't seem what
>> you're doing there.
>>
>>>
>>> Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well.
>>>
>>> Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock.
>>
>> What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict()
>> should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv
>> lock there.
> 
> Well the question is which dma-resv lock do we hold?
> 
> In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM.

Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted
and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all
dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted
list on validate(). This way we never touch the evicted list without holding at least the VM's
dma-resv lock.

Do you have any concerns about that?

> 
>>
>>>
>>> See what the eviction lock in amdgpu is doing for example.
>>
>> The eviction_lock seems to protect a VM state "evicting" of whether any BO that
>> is associated with the VM is currently evicting. At the same time amdgpu protects
>> the eviceted list of the VM with a different lock. So this seems to be entirely
>> unrelated. Tracking a "currently evicting" state is not part of the GPUVM
>> implementation currently and hence nothing would change for amdgpu there.
> 
> Sorry for the confusion we use different terminology in amdgpu.
> 
> The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated.
> 
> This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't.

I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM
objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing
the VA space does not require any dma-resv locks.

> 
> This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..).
> 
> 
> The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM.

That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is
currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence
would be the same driver specific code with the same driver specifc lock.

> 
> Regards,
> Christian.
> 
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>>>> Hi!
>>>>>
>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>>>>>>>>> Hi, Danilo,
>>>>>>>>>
>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>>>> track GPU VA
>>>>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>>>>> to their
>>>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>>>> on the GPU VA
>>>>>>>>>> space.
>>>>>>>>>>
>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>> drivers, which
>>>>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>>>>> manager
>>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>>> this patch aims
>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>
>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>>>> outside of
>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>
>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>> which are
>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>
>>>>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>>>>> resv the
>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>
>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>> contains mappings
>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>> accelerated.
>>>>>>>>>>
>>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>>
>>>>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>>>>> make all
>>>>>>>>>> features appear as a collection of optional helper functions,
>>>>>>>>>> such that
>>>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>>>> any feature
>>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>>
>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>> locking for drivers
>>>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>>>
>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>>>>> instance of this
>>>>>>>>>>      * particular combination. If not existent a new instance
>>>>>>>>>> is created and linked
>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>> + *
>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>> evicted objects. Those
>>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>>> dma-resv locks and
>>>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>>>> instance the all
>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>>>> locked by calling
>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>>> also possible to lock
>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>> corresponding parameters to
>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>>>> loop while making
>>>>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>>>>> or
>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>> + *
>>>>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>>>>> when its &dma_resv
>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>      */
>>>>>>>>>>     /**
>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>> creations and destructions
>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>>>> + *
>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>>>> evicted objects are
>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>> iteration internally.
>>>>>>>>>> + *
>>>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>>>> calls to functions
>>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>>>>> a particular
>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>> + *
>>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>>> such as
>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>> called with external
>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>> corresponding list to be
>>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>>> other API functions.
>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>      */
>>>>>>>>>>     /**
>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>      *   }
>>>>>>>>>>      */
>>>>>>>>>> +/**
>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>> already iterated items
>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>> + *
>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>> Lockless as in, the
>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>> first element from
>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>> concurrently.
>>>>>>>>> Are the list spinlocks needed for that async state update from
>>>>>>>>> within the
>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>
>>>>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>>>>> gpuvm's resv
>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>
>>>>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>>>>> could we
>>>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>>>> allows for)?
>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>>>> called for. Hence,
>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>>>>> different BOs.
>>>>>>> No. Only if you try to add external objects to the vm's evict list
>>>>>>> from
>>>>>>> within the evict code. That's not necessary since you loop through
>>>>>>> all
>>>>>>> external objects anyway when locking them so an "evicted" bool in
>>>>>>> the vm_bo,
>>>>>>> protected by the bo resv would be sufficient. The extobj locking
>>>>>>> loop can
>>>>>>> then add the bo to the evicted list.
>>>>>> And validate() can remove it while still holding all dma-resv locks,
>>>>>> neat!
>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>> concurrently? What
>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>> drm_gpuva_unlink()?
>>>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>>>>> on the
>>>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>>>>> with the
>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>> drm_gpuvm_bo_destroy()
>>>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>>>> potentially
>>>>>> free the dma-resv lock while holding it, at least if it's an external
>>>>>> object.
>>>>> Easiest way in this scheme is to think of the lists as being protected
>>>>> by the vm's resv lock. That means anybody calling unlink() must also
>>>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>>>> perhaps not from a locking inversion POW from an async list update).
>>>> This would mean that on unlink() we'd need to hold the VM's resv lock and the
>>>> corresponding GEM's resv lock (in case they're not the same anyways) because the
>>>> VM's resv lock would protect the external / evicted object lists and the GEM
>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>
>>>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>>>> really would not
>>>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>>>> the way in case
>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>>>> pretty
>>>>>>> costly and as discussed earlier this type of locking was the reason
>>>>>>> (at
>>>>>>> least according to the commit message) that made Christian drop the
>>>>>>> XArray
>>>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>>>> is
>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>> complexity and a
>>>>>>> single wide lock following the drm locking guidelines set out by
>>>>>>> Daniel and
>>>>>>> David should really be the default choice with an opt-in for a
>>>>>>> spinlock if
>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>> For the external object list an outer lock would work as long as it's
>>>>>> not the
>>>>>> dma-resv lock of the corresponding GEM object, since here we actually
>>>>>> need to
>>>>>> remove the list entry from the external object list on
>>>>>> drm_gpuvm_bo_destroy().
>>>>>> It's just a bit weird design wise that drivers would need to take
>>>>>> this outer
>>>>>> lock on:
>>>>>>
>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>> drm_gpuvm_bo_put())
>>>>>> - drm_gpuvm_exec_lock()
>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>> - drm_gpuvm_prepare_range()
>>>>>>
>>>>>> Given that it seems reasonable to do all the required locking
>>>>>> internally.
>>>>>  From a design POW, there has been a clear direction in XE to make
>>>>> things similar to mmap() / munmap(), so this outer lock, which in Xe is
>>>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>>>> the page-table structures and vma rb tree, the userptr structures and
>>>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>>>>> all of the above are just asserting that it is taken in the correct
>>>>> mode.
>>>>>
>>>>> But strictly with this scheme one could also use the vm's dma_resv for
>>>>> the extobj list since with drm_exec, it's locked before traversing the
>>>>> list.
>>>>>
>>>>> The whole point of this scheme is to rely on locks that you already are
>>>>> supposed to be holding for various reasons and is simple to comprehend.
>>>> I don't agree that we're supposed to hold the VM's resv lock anyways for
>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine using it
>>>> for that purpose nevertheless.
>>>>
>>>>>> In order to at least place lockdep checks, the driver would need to
>>>>>> supply the
>>>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>>>>> know about
>>>>>> the lock.
>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>> I'd really like to avoid that, especially now that everything got simpler. We
>>>> should define the actual locks to take instead.
>>>>
>>>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>>>>> need to
>>>>>> spin?
>>>>> I guess it's hard to tell exactly, but it is much lower on modern x86
>>>>> than what it used to be. Not sure about ARM, which is the other
>>>>> architecture important to us. I figure if there is little cache-line
>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>
>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>
>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>>>> spinlock_t
>>>>>>> *lock)
>>>>>>>
>>>>>>> {
>>>>>>>
>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>          spin_lock(lock);
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>>>> hold the vm's
>>>>>>>>> resv, though.
>>>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>>>> gpuva list (or
>>>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>>>> lock for that
>>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>>> otherwise wouldn't
>>>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>>>> was referring to
>>>>>>>> earlier.
>>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>>>> list, but
>>>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>>>> problem. We
>>>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>>>> but we
>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>>>> calls to
>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>> Drivers calling unlink() from the fence signaling path can't use the
>>>>>> VM's
>>>>>> dma-resv lock.
>>>>> Yes, that made me a bit curious because in the current version the code
>>>>> required the object's dma_resv for unlink() which can't be grabbed
>>>>> either from the fence signaling path. So are there any drivers actually
>>>>> wanting to do that? If so, they will either need to resort to the
>>>>> current spinlock solution or they will need to call unlink from a
>>>>> workqueue item.
>>>> As Boris already mentioned we have the dma-resv lock by default or a driver
>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>>>
>>>>>> Also, what if the object is an external object? We can't use the VM's
>>>>>> dma-resv
>>>>>> lock here.
>>>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>>>> operation where it should be trivial to grab the vm's resv. Or, for
>>>>> that matter any outer lock protecting the extobj list. Rule would be
>>>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>>>>> be protected by either the vm's dma_resv (or possibly an outer lock in
>>>>> the case of the extobj list).
>>>> Outer lock wouldn't have been working for updates in the async path, but
>>>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>>>
>>>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>>> refcount drops
>>>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>>>>> drop the
>>>>>> last reference of the GEM object.
>>>>> Yes, but this is a different problem as to what exactly protects
>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
>>>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
>>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>>> pointer you dereference unless you're under a lock that ensures keeping
>>>>> the object alive is pretty much required?) But anyway for the
>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
>>>>> I don't have a strong preference.
>>>> We can keep the GEM objects dma-resv lock, however as mentioned above
>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's resv lock
>>>> and the GEM's resv lock in case they differ.
>>>>
>>>>>>   All those problems go away with a dedicated
>>>>>> GEM gpuva list lock.
>>>>> I don't think these are real problems.
>>>>> With the excepton of the eviction list "trick" where we currently have
>>>>> slightly different approach to collect external bos needing rebinding,
>>>>> we have this working fine.
>>>>>
>>>>> TBH I think pretty much the only situation where the spinlock is needed
>>>>> is for async updates of these lists, unless a wq item can be used for
>>>>> that, but it doesn't really seem like the current code allows for such
>>>>> updates anyway? It complicates the code a lot, adds overhead and also
>>>>> adds the requirement for refcounting during list traversal.
>>>>>
>>>>> /Thomas
>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>>>> atomic.
>>>>>>>>>
>>>>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>>>>> when
>>>>>>>>> possible".
>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>> locking inversion?
>>>>>>>>>
>>>>>>>>> /Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> + *
>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>> local list, so removal
>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>> iterating the list.
>>>>>>>>>> + */
>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>> +       ({
>>>>>>>>>>                             \
>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>> +
>>>>>>>>>>                             \
>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>                             \
>>>>>>>>>> +
>>>>>>>>>>                             \
>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>> + struct
>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>> +
>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>> +                       if
>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>> {                    \
>>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>> +
>>>>>>>>>> __local_list);                           \
>>>>>>>>>> +                               break;
>>>>>>>>>>                             \
>>>>>>>>>> +                       } else
>>>>>>>>>> {                                                        \
>>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>> NULL;                                         \
>>>>>>>>>> +                       }
>>>>>>>>>>                             \
>>>>>>>>>> +               }
>>>>>>>>>>                             \
>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>> +
>>>>>>>>>>                             \
>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>                             \
>>>>>>>>>> +       })
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>>>> + *
>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>> Lockless as in, the
>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>> first element from the
>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>> concurrently.
>>>>>>>>>> + *
>>>>>>>>>> + * Typical use:
>>>>>>>>>> + *
>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>> + *
>>>>>>>>>> + *     ret = 0;
>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>>> + *             if (ret)
>>>>>>>>>> + *                     break;
>>>>>>>>>> + *     }
>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>> &my_local_list);
>>>>>>>>>> + *
>>>>>>>>>> + *
>>>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>>>> exposed to the outside
>>>>>>>>>> + * world.
>>>>>>>>>> + */
>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>> __list_name,           \
>>>>>>>>>> +                                               __local_list,
>>>>>>>>>> NULL);            \
>>>>>>>>>> +
>>>>>>>>>> __vm_bo;
>>>>>>>>>>        \
>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>> __list_name,           \
>>>>>>>>>> +                                               __local_list,
>>>>>>>>>> __vm_bo))         \
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>>>> original list
>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>> already iterated items
>>>>>>>>>> + *
>>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>> + * to restore the original state and let new iterations take
>>>>>>>>>> place.
>>>>>>>>>> + */
>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>> __local_list)                         \
>>>>>>>>>> +       do
>>>>>>>>>> {
>>>>>>>>>>                  \
>>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>>> list elements to the          \
>>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>>> case it matters.              \
>>>>>>>>>> +
>>>>>>>>>> */
>>>>>>>>>>            \
>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>> +       } while (0)
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>>>> list
>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>> + *
>>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>>> @__list_name and
>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>> + */
>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>> __list_name)                            \
>>>>>>>>>> +       do
>>>>>>>>>> {
>>>>>>>>>>          \
>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>> +       } while (0)
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>>>> list
>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>> + *
>>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>>> @__list_name and
>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>> + */
>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>> __list_name)                            \
>>>>>>>>>> +       do
>>>>>>>>>> {
>>>>>>>>>>          \
>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>> +       } while (0)
>>>>>>>>>> +
>>>>>>>>>> +static int __must_check
>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>> +
>>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>> +
>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>> +
>>>>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>>> *gpuvm)
>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>>>> memory.\n");
>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>>>> should be empty.\n");
>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>>> should be empty.\n");
>>>>>>>>>> +
>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>     }
>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>> + *
>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>>>>> given
>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>> + *
>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>> responsibility to call
>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>> + *
>>>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>>>> and removal of
>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>> concurrent usage itself.
>>>>>>>>>> + *
>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>> either an outer VM lock
>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>>>>> within the
>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>> + */
>>>>>>>>>> +int
>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>>> +{
>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>> +       int ret = 0;
>>>>>>>>>> +
>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>> vm_bo) {
>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>>>> num_fences);
>>>>>>>>>> +               if (ret)
>>>>>>>>>> +                       break;
>>>>>>>>>> +       }
>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>> +
>>>>>>>>>> +       return ret;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>>>>> a given range
>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>> + *
>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>>> mapped between @addr
>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>> + */
>>>>>>>>>> +int
>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>> drm_exec *exec,
>>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>>> num_fences)
>>>>>>>>>> +{
>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>> +       int ret;
>>>>>>>>>> +
>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>>> +
>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>> num_fences);
>>>>>>>>>> +               if (ret)
>>>>>>>>>> +                       return ret;
>>>>>>>>>> +       }
>>>>>>>>>> +
>>>>>>>>>> +       return 0;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>> assoiciated BOs
>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>> + *
>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>> given
>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>> + *
>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>>>> lock additional
>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>>>> Typically, drivers
>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>> callback.
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>> + */
>>>>>>>>>> +int
>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>> +{
>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>> +       int ret;
>>>>>>>>>> +
>>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>> 0 |
>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>> +
>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>> +
>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>> num_fences);
>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>> +               if (ret)
>>>>>>>>>> +                       goto err;
>>>>>>>>>> +
>>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>> num_fences);
>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>> +               if (ret)
>>>>>>>>>> +                       goto err;
>>>>>>>>>> +
>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>> num_fences);
>>>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>>>> +                       if (ret)
>>>>>>>>>> +                               goto err;
>>>>>>>>>> +               }
>>>>>>>>>> +       }
>>>>>>>>>> +
>>>>>>>>>> +       return 0;
>>>>>>>>>> +
>>>>>>>>>> +err:
>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>> +       return ret;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>> +
>>>>>>>>>> +static int
>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>>>> num_fences)
>>>>>>>>>> +{
>>>>>>>>>> +       struct {
>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>> +
>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>> objs,
>>>>>>>>>> + args->num_objs,
>>>>>>>>>> num_fences);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>>> assoiciated BOs
>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>>>> lock
>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>> + *
>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>> + */
>>>>>>>>>> +int
>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>> +{
>>>>>>>>>> +       struct {
>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>> +       } args;
>>>>>>>>>> +
>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>> +
>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>> +
>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>> interruptible);
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>> within a given range
>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>> + *
>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>> mapped between @addr and
>>>>>>>>>> + * @addr + @range.
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>> + */
>>>>>>>>>> +int
>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>> +{
>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>> +       int ret;
>>>>>>>>>> +
>>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>> 0 |
>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>> +
>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>> +
>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>> addr, range,
>>>>>>>>>> + num_fences);
>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>> +               if (ret)
>>>>>>>>>> +                       goto err;
>>>>>>>>>> +       }
>>>>>>>>>> +
>>>>>>>>>> +       return ret;
>>>>>>>>>> +
>>>>>>>>>> +err:
>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>> +       return ret;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>> + *
>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>>> evicted buffer
>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>> + */
>>>>>>>>>> +int
>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>> +{
>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>> +       int ret = 0;
>>>>>>>>>> +
>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>> +
>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>> +               if (ret)
>>>>>>>>>> +                       break;
>>>>>>>>>> +       }
>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>> +
>>>>>>>>>> +       return ret;
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>>>> extobj
>>>>>>>>>> + * dma-resv
>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>> + */
>>>>>>>>>> +void
>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>>> +{
>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>> +       unsigned long index;
>>>>>>>>>> +
>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>> obj) ?
>>>>>>>>>> +                                  private_usage :
>>>>>>>>>> extobj_usage);
>>>>>>>>>> +       }
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>> +
>>>>>>>>>>     /**
>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>>> *gpuvm,
>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>> +
>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>          return vm_bo;
>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>> +
>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>> +
>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>>>      *
>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>> + *
>>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>>> destroyed, which
>>>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>>>> a call to this
>>>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>>>> the caller must
>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>      */
>>>>>>>>>>     void
>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>>> *vm_bo)
>>>>>>>>>>     }
>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>> +static int __must_check
>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>> +{
>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>     }
>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>> + * extobj list
>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>>>> extobj list.
>>>>>>>>>> + *
>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>>>> not on the list
>>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>>> external object,
>>>>>>>>>> + * actually.
>>>>>>>>>> + */
>>>>>>>>>> +void
>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>> +{
>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>> +
>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>>>> / from a
>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>> + *
>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>> + */
>>>>>>>>>> +void
>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>>>> +{
>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>> +
>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>> +               if (evict)
>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>>>> +               else
>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>>>> +       }
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>> +
>>>>>>>>>>     static int
>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>      */
>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>           * space
>>>>>>>>>>           */
>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>> +
>>>>>>>>>> +       /**
>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>> +        */
>>>>>>>>>> +       struct {
>>>>>>>>>> +               /**
>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>> serving as
>>>>>>>>>> +                * external object
>>>>>>>>>> +                */
>>>>>>>>>> +               struct list_head list;
>>>>>>>>>> +
>>>>>>>>>> +               /**
>>>>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>>>>> +                */
>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>> +       } extobj;
>>>>>>>>>> +
>>>>>>>>>> +       /**
>>>>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>>>>> list lock
>>>>>>>>>> +        */
>>>>>>>>>> +       struct {
>>>>>>>>>> +               /**
>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>> currently being
>>>>>>>>>> +                * evicted
>>>>>>>>>> +                */
>>>>>>>>>> +               struct list_head list;
>>>>>>>>>> +
>>>>>>>>>> +               /**
>>>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>>>> +                */
>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>> +       } evict;
>>>>>>>>>>     };
>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>> drm_device *drm,
>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>> + * external object
>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>>> from the
>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>> + */
>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>> *gpuvm,
>>>>>>>>>> +                                      struct drm_gem_object
>>>>>>>>>> *obj)
>>>>>>>>>> +{
>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>     {
>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>>>>> \
>>>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>> +/**
>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>> &drm_exec
>>>>>>>>>> + *
>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>> &drm_exec should be.
>>>>>>>>>> + *
>>>>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>> + */
>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>> +       /**
>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>> +        */
>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>> +
>>>>>>>>>> +       /**
>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>>>> +        */
>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>> +
>>>>>>>>>> +       /**
>>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>>> for the driver to
>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>> +        */
>>>>>>>>>> +       struct {
>>>>>>>>>> +               /**
>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>> +                */
>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>>> +
>>>>>>>>>> +               /**
>>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>>> callback
>>>>>>>>>> +                */
>>>>>>>>>> +               void *priv;
>>>>>>>>>> +       } extra;
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>>>> resv
>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>> + *
>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>> &drm_gem_object.
>>>>>>>>>> + *
>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>> responsibility to call
>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>> + */
>>>>>>>>>> +static inline int
>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>> +{
>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>> num_fences);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>> +                             unsigned int num_fences);
>>>>>>>>>> +
>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>>> +
>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>> +
>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>> *vm_exec,
>>>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>>>> +                             unsigned int num_objs,
>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>> +
>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>> *vm_exec,
>>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>>>> BOs
>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>> + *
>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>> previously acquired
>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>> + *
>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>> + */
>>>>>>>>>> +static inline void
>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>> +{
>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>> private_usage,
>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>> extobj_usage);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>> + *
>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>> + */
>>>>>>>>>> +static inline void
>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>> *vm_exec,
>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>> private_usage,
>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>> extobj_usage)
>>>>>>>>>> +{
>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>>>>> fence,
>>>>>>>>>> +                                private_usage,
>>>>>>>>>> extobj_usage);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>     /**
>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>                           */
>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>> +
>>>>>>>>>> +                       /**
>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>> +                        * extobj list.
>>>>>>>>>> +                        */
>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>> +
>>>>>>>>>> +                       /**
>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>> +                        * list.
>>>>>>>>>> +                        */
>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>                  } entry;
>>>>>>>>>>          } list;
>>>>>>>>>>     };
>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>>> evict);
>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>> +
>>>>>>>>>>     /**
>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>> iteration step
>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>           * used.
>>>>>>>>>>           */
>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>>>> *priv);
>>>>>>>>>> +
>>>>>>>>>> +       /**
>>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>>> +        *
>>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>>> &drm_gem_object being
>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>> +        *
>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>> specific variant of
>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>> +        */
>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>     };
>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>>>
>>
>
Boris Brezillon Sept. 14, 2023, 8:20 a.m. UTC | #29
On Wed, 13 Sep 2023 15:22:56 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> On 9/13/23 13:33, Boris Brezillon wrote:
> > On Wed, 13 Sep 2023 12:39:01 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> >> Hi,
> >>
> >> On 9/13/23 09:19, Boris Brezillon wrote:  
> >>> On Wed, 13 Sep 2023 17:05:42 +1000
> >>> Dave Airlie <airlied@gmail.com> wrote:
> >>>     
> >>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
> >>>> <boris.brezillon@collabora.com> wrote:  
> >>>>> On Tue, 12 Sep 2023 18:20:32 +0200
> >>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>>>        
> >>>>>>> +/**
> >>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
> >>>>>>> + * @__gpuvm: The GPU VM
> >>>>>>> + * @__list_name: The name of the list we're iterating on
> >>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
> >>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> >>>>>>> + *
> >>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
> >>>>>>> + * iterator releases the lock immediately after picking the first element from
> >>>>>>> + * the list, so list insertion deletion can happen concurrently.  
> >>>>>> Are the list spinlocks needed for that async state update from within
> >>>>>> the dma-fence critical section we've discussed previously?  
> >>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> >>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
> >>>>> get that Xe and Nouveau don't need that because they update the VM
> >>>>> state early (in the ioctl path), but I keep thinking this will hurt us
> >>>>> if we don't think it through from the beginning, because once you've
> >>>>> set this logic to depend only on resv locks, it will be pretty hard to
> >>>>> get back to a solution which lets synchronous VM_BINDs take precedence
> >>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
> >>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
> >>>>> take a long time to get your synchronous VM_BIND executed...  
> >> So this would boil down to either (possibly opt-in) keeping the spinlock
> >> approach or pushing the unlink out to a wq then?  
> > Deferred _unlink() would not be an issue, since I already defer the
> > drm_gpuva destruction to a wq, it would just a be a matter of moving the
> > _unlink() call there as well. But _link() also takes the GEM gpuva list
> > lock, and that one is bit tricky, in that sm_map() can trigger 2 more
> > _link() calls for the prev/next mappings, which we can't guess until we
> > get to execute the VM update. If we mandate the use of the GEM resv
> > lock, that simply means async VM updates (AKA calling
> > drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
> > agrees on, then I'd like the APIs that make this sort of async VM
> > update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
> > methods, and probably other things) to be dropped, so we don't make it
> > look like it's something we support.
> >  
> >> BTW, as also asked in a reply to Danilo, how do you call unlink from
> >> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?  
> > _unlink() makes sure the GEM gpuva list lock is taken, but this can be
> > a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
> > panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
> > protection. We make sure we never take this lock while allocating
> > memory to guarantee the dma-signalling path can't deadlock.
> >  
> >>>>>        
> >>>> btw what is the use case for this? do we have actual vulkan
> >>>> applications we know will have problems here?  
> >>> I don't, but I think that's a concern Faith raised at some point (dates
> >>> back from when I was reading threads describing how VM_BIND on i915
> >>> should work, and I was clearly discovering this whole VM_BIND thing at
> >>> that time, so maybe I misunderstood).
> >>>     
> >>>> it feels like a bit of premature optimisation, but maybe we have use cases.  
> >>> Might be, but that's the sort of thing that would put us in a corner if
> >>> we don't have a plan for when the needs arise. Besides, if we don't
> >>> want to support that case because it's too complicated, I'd recommend
> >>> dropping all the drm_gpuvm APIs that let people think this mode is
> >>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
> >>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
> >>> confusion.  
> >> Xe allows bypassing the bind-queue with another bind-queue, but to
> >> completely avoid dependencies between queues the Operations may not
> >> overlap.  
> > So, you check the VM state with some VM lock held (would be the VM resv
> > in my case), and if the mapping is new (no overlaps with pre-existing
> > mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
> > be missing I guess is a way to know if the mapping is active (MMU has
> > been updated) or pending (MMU update queued to the bind-queue), so I can
> > fast-track mapping/unmapping of active mappings.

Ok, so I started modifying the implementation, and quickly realized the
overlap test can't be done without your xe_range_fence tree because of
unmaps. Since we call drm_gpuva_unmap() early/in the IOCTL path (IOW,
before the mapping teardown is effective), we lose track of this
yet-to-be-executed-unmap operation, and if we do our
va_range_overlaps_with_existing_mappings() test after such an unmap has
been queued using just the drm_gpuvm tree, we might get false even if
the mapping still exists and is expected to be torn down when the
VM_BIND(unmap) job is executed on the bind-queue. As a result, this
might execute the VM_BIND(map,sync) immediately (because the dependency
went undetected), and then the vm_bind_run_job() function kicks in and
undoes what the synchronous VM_BIND(map) did. Am I missing something?

If I'm correct, that means I'm back to having synchronous VM_BIND ops
queued after all asynchronous ones unless I use something like your
xe_range_fence solution (which I was hoping I could postpone until we
decide to expose multiple bind queues).

I'm still a bit skeptical about this 'update VM mappings tree early,
defer MMU page table updates' approach, where the VM state and the
actual page table tree are temporarily out of sync until all operations
have been flushed on all queues targeting a VM. This means any test we
do on the gpuvm, like, 'give me the BO mapped at VA xxx', is subject to
'is this the current state or the future state?' questioning. Note that
we can't even get the current VM state anymore, because all the
drm_gpuvm::tree stores with this solution is the future state, and
to-be-unmapped mappings are lost during the transitioning period (when
vm_bind jobs are queued but not executed yet).

> > This would leave
> > overlapping sync/async VM updates, which can't happen in practice
> > unless userspace is doing something wrong (sparse bindings always go
> > through vkQueueBindSparse).  
> 
> User-space is allowed to create new bind queues at will, and they 
> execute independently save for range overlaps.
> 
> And the overlapping granularity depends very much on the detail of the 
> range tracking.
> We drafted this fenced range utility
>
> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/353

I'll try to see if there's a way we can have something generic shared
at the gpuvm level.
Thomas Hellstrom Sept. 14, 2023, 10:45 a.m. UTC | #30
On 9/14/23 10:20, Boris Brezillon wrote:
> On Wed, 13 Sep 2023 15:22:56 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> On 9/13/23 13:33, Boris Brezillon wrote:
>>> On Wed, 13 Sep 2023 12:39:01 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>   
>>>> Hi,
>>>>
>>>> On 9/13/23 09:19, Boris Brezillon wrote:
>>>>> On Wed, 13 Sep 2023 17:05:42 +1000
>>>>> Dave Airlie <airlied@gmail.com> wrote:
>>>>>      
>>>>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
>>>>>> <boris.brezillon@collabora.com> wrote:
>>>>>>> On Tue, 12 Sep 2023 18:20:32 +0200
>>>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>>>         
>>>>>>>>> +/**
>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
>>>>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>> + *
>>>>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
>>>>>>>>> + * iterator releases the lock immediately after picking the first element from
>>>>>>>>> + * the list, so list insertion deletion can happen concurrently.
>>>>>>>> Are the list spinlocks needed for that async state update from within
>>>>>>>> the dma-fence critical section we've discussed previously?
>>>>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
>>>>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
>>>>>>> get that Xe and Nouveau don't need that because they update the VM
>>>>>>> state early (in the ioctl path), but I keep thinking this will hurt us
>>>>>>> if we don't think it through from the beginning, because once you've
>>>>>>> set this logic to depend only on resv locks, it will be pretty hard to
>>>>>>> get back to a solution which lets synchronous VM_BINDs take precedence
>>>>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
>>>>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
>>>>>>> take a long time to get your synchronous VM_BIND executed...
>>>> So this would boil down to either (possibly opt-in) keeping the spinlock
>>>> approach or pushing the unlink out to a wq then?
>>> Deferred _unlink() would not be an issue, since I already defer the
>>> drm_gpuva destruction to a wq, it would just a be a matter of moving the
>>> _unlink() call there as well. But _link() also takes the GEM gpuva list
>>> lock, and that one is bit tricky, in that sm_map() can trigger 2 more
>>> _link() calls for the prev/next mappings, which we can't guess until we
>>> get to execute the VM update. If we mandate the use of the GEM resv
>>> lock, that simply means async VM updates (AKA calling
>>> drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
>>> agrees on, then I'd like the APIs that make this sort of async VM
>>> update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
>>> methods, and probably other things) to be dropped, so we don't make it
>>> look like it's something we support.
>>>   
>>>> BTW, as also asked in a reply to Danilo, how do you call unlink from
>>>> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?
>>> _unlink() makes sure the GEM gpuva list lock is taken, but this can be
>>> a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
>>> panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
>>> protection. We make sure we never take this lock while allocating
>>> memory to guarantee the dma-signalling path can't deadlock.
>>>   
>>>>>>>         
>>>>>> btw what is the use case for this? do we have actual vulkan
>>>>>> applications we know will have problems here?
>>>>> I don't, but I think that's a concern Faith raised at some point (dates
>>>>> back from when I was reading threads describing how VM_BIND on i915
>>>>> should work, and I was clearly discovering this whole VM_BIND thing at
>>>>> that time, so maybe I misunderstood).
>>>>>      
>>>>>> it feels like a bit of premature optimisation, but maybe we have use cases.
>>>>> Might be, but that's the sort of thing that would put us in a corner if
>>>>> we don't have a plan for when the needs arise. Besides, if we don't
>>>>> want to support that case because it's too complicated, I'd recommend
>>>>> dropping all the drm_gpuvm APIs that let people think this mode is
>>>>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
>>>>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
>>>>> confusion.
>>>> Xe allows bypassing the bind-queue with another bind-queue, but to
>>>> completely avoid dependencies between queues the Operations may not
>>>> overlap.
>>> So, you check the VM state with some VM lock held (would be the VM resv
>>> in my case), and if the mapping is new (no overlaps with pre-existing
>>> mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
>>> be missing I guess is a way to know if the mapping is active (MMU has
>>> been updated) or pending (MMU update queued to the bind-queue), so I can
>>> fast-track mapping/unmapping of active mappings.
> Ok, so I started modifying the implementation, and quickly realized the
> overlap test can't be done without your xe_range_fence tree because of
> unmaps. Since we call drm_gpuva_unmap() early/in the IOCTL path (IOW,
> before the mapping teardown is effective), we lose track of this
> yet-to-be-executed-unmap operation, and if we do our
> va_range_overlaps_with_existing_mappings() test after such an unmap has
> been queued using just the drm_gpuvm tree, we might get false even if
> the mapping still exists and is expected to be torn down when the
> VM_BIND(unmap) job is executed on the bind-queue. As a result, this
> might execute the VM_BIND(map,sync) immediately (because the dependency
> went undetected), and then the vm_bind_run_job() function kicks in and
> undoes what the synchronous VM_BIND(map) did. Am I missing something?
>
> If I'm correct, that means I'm back to having synchronous VM_BIND ops
> queued after all asynchronous ones unless I use something like your
> xe_range_fence solution (which I was hoping I could postpone until we
> decide to expose multiple bind queues).

Yes, unfortunately fine-granular async range-tracking comes with a cost. 
Still, if you are doing page-table updates solely with the CPU, you 
could probably short-circuit the fence part of the fenced ranges?


>
> I'm still a bit skeptical about this 'update VM mappings tree early,
> defer MMU page table updates' approach, where the VM state and the
> actual page table tree are temporarily out of sync until all operations
> have been flushed on all queues targeting a VM. This means any test we
> do on the gpuvm, like, 'give me the BO mapped at VA xxx', is subject to
> 'is this the current state or the future state?' questioning. Note that
> we can't even get the current VM state anymore, because all the
> drm_gpuvm::tree stores with this solution is the future state, and
> to-be-unmapped mappings are lost during the transitioning period (when
> vm_bind jobs are queued but not executed yet).

Understandable. But this is the way we historically have been doing 
things, (I think the whole async atomic page-flipping is using the same 
concept), but rather than refering to it as current state and future 
state, I'd like to think it as Synchronous CPU state (What an API user 
sees) vs GPU state (What the GPU sees where it's currently executing). 
To bring them in sync you need to wait for fences. And ideally the async 
work should never fail.

If one wants to push async work out to be handled solely by the GPU, 
this is the way things must be done since the GPU can't take locks or 
allocate memory, but as part or all of async work is sometimes done 
using the CPU, it might make sense to challenge that to some extent. 
There are indeed pros and cons with both approaches.

/Thomas

>
>>> This would leave
>>> overlapping sync/async VM updates, which can't happen in practice
>>> unless userspace is doing something wrong (sparse bindings always go
>>> through vkQueueBindSparse).
>> User-space is allowed to create new bind queues at will, and they
>> execute independently save for range overlaps.
>>
>> And the overlapping granularity depends very much on the detail of the
>> range tracking.
>> We drafted this fenced range utility
>>
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/merge_requests/353
> I'll try to see if there's a way we can have something generic shared
> at the gpuvm level.
Danilo Krummrich Sept. 14, 2023, 10:57 a.m. UTC | #31
On 9/13/23 14:16, Danilo Krummrich wrote:

<snip>

>>> And validate() can remove it while still holding all dma-resv locks,
>>> neat!
>>> However, what if two tasks are trying to lock the VA space
>>> concurrently? What
>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>> drm_gpuva_unlink()?
>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>> on the
>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>> with the
>>> dma-resv lock held, which wouldn't be allowed, since
>>> drm_gpuvm_bo_destroy()
>>> might drop the last reference to the drm_gem_object and hence we'd
>>> potentially
>>> free the dma-resv lock while holding it, at least if it's an external
>>> object.
>>
>> Easiest way in this scheme is to think of the lists as being protected
>> by the vm's resv lock. That means anybody calling unlink() must also
>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>> perhaps not from a locking inversion POW from an async list update).
> 
> This would mean that on unlink() we'd need to hold the VM's resv lock and the
> corresponding GEM's resv lock (in case they're not the same anyways) because the
> VM's resv lock would protect the external / evicted object lists and the GEM
> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
> drm_gpuvm_bo's list of drm_gpuvas.

As mentioned below the same applies for drm_gpuvm_bo_put() since it might
destroy the vm_bo, which includes removing the vm_bo from external / evicted
object lists and the GEMs list of vm_bos.

As mentioned, if the GEM's dma-resv is different from the VM's dma-resv we need
to take both locks. Ultimately, this would mean we need a drm_exec loop, because
we can't know the order in which to take these locks. Doing a full drm_exec loop
just to put() a vm_bo doesn't sound reasonable to me.

Can we instead just have an internal mutex for locking the lists such that we
avoid taking and dropping the spinlocks, which we use currently, in a loop?

- Danilo

> 
>>
>>>
>>>>>
>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>> really would not
>>>>> like to add even more complexity just to get the spinlock out of
>>>>> the way in case
>>>>> the driver already has an outer lock protecting this path.
>>>>
>>>> I must disagree here. These spinlocks and atomic operations are
>>>> pretty
>>>> costly and as discussed earlier this type of locking was the reason
>>>> (at
>>>> least according to the commit message) that made Christian drop the
>>>> XArray
>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>> is
>>>> unecessary and measurable". IMHO the spinlock is the added
>>>> complexity and a
>>>> single wide lock following the drm locking guidelines set out by
>>>> Daniel and
>>>> David should really be the default choice with an opt-in for a
>>>> spinlock if
>>>> needed for async and pushing out to a wq is not an option.
>>>
>>> For the external object list an outer lock would work as long as it's
>>> not the
>>> dma-resv lock of the corresponding GEM object, since here we actually
>>> need to
>>> remove the list entry from the external object list on
>>> drm_gpuvm_bo_destroy().
>>> It's just a bit weird design wise that drivers would need to take
>>> this outer
>>> lock on:
>>>
>>> - drm_gpuvm_bo_extobj_add()
>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>> - drm_gpuva_unlink()            (because it needs to call
>>> drm_gpuvm_bo_put())
>>> - drm_gpuvm_exec_lock()
>>> - drm_gpuvm_exec_lock_array()
>>> - drm_gpuvm_prepare_range()
>>>
>>> Given that it seems reasonable to do all the required locking
>>> internally.
>>
>>  From a design POW, there has been a clear direction in XE to make
>> things similar to mmap() / munmap(), so this outer lock, which in Xe is
>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>> the page-table structures and vma rb tree, the userptr structures and
>> the extobj list. Basically it's taken early in the exec IOCTL, the
>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>> all of the above are just asserting that it is taken in the correct
>> mode.
>>
>> But strictly with this scheme one could also use the vm's dma_resv for
>> the extobj list since with drm_exec, it's locked before traversing the
>> list.
>>
>> The whole point of this scheme is to rely on locks that you already are
>> supposed to be holding for various reasons and is simple to comprehend.
> 
> I don't agree that we're supposed to hold the VM's resv lock anyways for
> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine using it
> for that purpose nevertheless.
> 
>>
>>>
>>> In order to at least place lockdep checks, the driver would need to
>>> supply the
>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>> know about
>>> the lock.
>>
>> Yes, that sounds reasonable. One lockdep map per list.
> 
> I'd really like to avoid that, especially now that everything got simpler. We
> should define the actual locks to take instead.
> 
>>
>>>
>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>> need to
>>> spin?
>>
>> I guess it's hard to tell exactly, but it is much lower on modern x86
>> than what it used to be. Not sure about ARM, which is the other
>> architecture important to us. I figure if there is little cache-line
>> bouncing the main overhead comes from the implied barriers.
>>
>>>
>>>>
>>>> A pretty simple way that would not add much code would be
>>>>
>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>> spinlock_t
>>>> *lock)
>>>>
>>>> {
>>>>
>>>>      if (!gpuvm->resv_protected_lists)
>>>>          spin_lock(lock);
>>>>
>>>> }
>>>>
>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>> hold the vm's
>>>>>> resv, though.
>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>> gpuva list (or
>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>> lock for that
>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>> otherwise wouldn't
>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>> was referring to
>>>>> earlier.
>>>>
>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>> list, but
>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>> problem. We
>>>> may free the object and a pointer to the vm's resv during unlink
>>>> but we
>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>> calls to
>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>
>>> Drivers calling unlink() from the fence signaling path can't use the
>>> VM's
>>> dma-resv lock.
>>
>> Yes, that made me a bit curious because in the current version the code
>> required the object's dma_resv for unlink() which can't be grabbed
>> either from the fence signaling path. So are there any drivers actually
>> wanting to do that? If so, they will either need to resort to the
>> current spinlock solution or they will need to call unlink from a
>> workqueue item.
> 
> As Boris already mentioned we have the dma-resv lock by default or a driver
> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
> 
>>>
>>> Also, what if the object is an external object? We can't use the VM's
>>> dma-resv
>>> lock here.
>>
>> Why? Typically (sync) unlink is only ever called from an unbind-like
>> operation where it should be trivial to grab the vm's resv. Or, for
>> that matter any outer lock protecting the extobj list. Rule would be
>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>> be protected by either the vm's dma_resv (or possibly an outer lock in
>> the case of the extobj list).
> 
> Outer lock wouldn't have been working for updates in the async path, but
> shouldn't be relevant anymore. We could use the VM's resv for that.
> 
>>
>>>   And we can't have the GEM objs dma-resv lock held when calling
>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>> refcount drops
>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>> drop the
>>> last reference of the GEM object.
>>
>> Yes, but this is a different problem as to what exactly protects
>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
>> Boris didn't like that, but requiring an explicit refcount for a
>> pointer you dereference unless you're under a lock that ensures keeping
>> the object alive is pretty much required?) But anyway for the
>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
>> I don't have a strong preference.
> 
> We can keep the GEM objects dma-resv lock, however as mentioned above
> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's resv lock
> and the GEM's resv lock in case they differ.
> 

>>>>
Thomas Hellstrom Sept. 14, 2023, 11:32 a.m. UTC | #32
On 9/14/23 12:57, Danilo Krummrich wrote:
> On 9/13/23 14:16, Danilo Krummrich wrote:
>
> <snip>
>
>>>> And validate() can remove it while still holding all dma-resv locks,
>>>> neat!
>>>> However, what if two tasks are trying to lock the VA space
>>>> concurrently? What
>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>> drm_gpuva_unlink()?
>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>>> on the
>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>>> with the
>>>> dma-resv lock held, which wouldn't be allowed, since
>>>> drm_gpuvm_bo_destroy()
>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>> potentially
>>>> free the dma-resv lock while holding it, at least if it's an external
>>>> object.
>>>
>>> Easiest way in this scheme is to think of the lists as being protected
>>> by the vm's resv lock. That means anybody calling unlink() must also
>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>> perhaps not from a locking inversion POW from an async list update).
>>
>> This would mean that on unlink() we'd need to hold the VM's resv lock 
>> and the
>> corresponding GEM's resv lock (in case they're not the same anyways) 
>> because the
>> VM's resv lock would protect the external / evicted object lists and 
>> the GEM
>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>> drm_gpuvm_bo's list of drm_gpuvas.
>
> As mentioned below the same applies for drm_gpuvm_bo_put() since it might
> destroy the vm_bo, which includes removing the vm_bo from external / 
> evicted
> object lists and the GEMs list of vm_bos.
>
> As mentioned, if the GEM's dma-resv is different from the VM's 
> dma-resv we need
> to take both locks. Ultimately, this would mean we need a drm_exec 
> loop, because
> we can't know the order in which to take these locks. Doing a full 
> drm_exec loop
> just to put() a vm_bo doesn't sound reasonable to me.
>
> Can we instead just have an internal mutex for locking the lists such 
> that we
> avoid taking and dropping the spinlocks, which we use currently, in a 
> loop?

You'd have the same locking inversion problem with a mutex, right? Since 
in the eviction path you have resv->mutex, from exec you have 
resv->mutex->resv because validate would attempt to grab resv.

That said, xe currently indeed does the vm+bo exec dance on vma put.

One reason why that seemingly horrible construct is good, is that when 
evicting an extobj and you need to access individual vmas to Zap page 
table entries or TLB flush, those VMAs are not allowed to go away (we're 
not refcounting them). Holding the bo resv on gpuva put prevents that 
from happening. Possibly one could use another mutex to protect the 
gem->vm_bo list to achieve the same, but we'd need to hold it on gpuva put.

/Thomas


>
> - Danilo
>
>>
>>>
>>>>
>>>>>>
>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>> really would not
>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>> the way in case
>>>>>> the driver already has an outer lock protecting this path.
>>>>>
>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>> pretty
>>>>> costly and as discussed earlier this type of locking was the reason
>>>>> (at
>>>>> least according to the commit message) that made Christian drop the
>>>>> XArray
>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>> is
>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>> complexity and a
>>>>> single wide lock following the drm locking guidelines set out by
>>>>> Daniel and
>>>>> David should really be the default choice with an opt-in for a
>>>>> spinlock if
>>>>> needed for async and pushing out to a wq is not an option.
>>>>
>>>> For the external object list an outer lock would work as long as it's
>>>> not the
>>>> dma-resv lock of the corresponding GEM object, since here we actually
>>>> need to
>>>> remove the list entry from the external object list on
>>>> drm_gpuvm_bo_destroy().
>>>> It's just a bit weird design wise that drivers would need to take
>>>> this outer
>>>> lock on:
>>>>
>>>> - drm_gpuvm_bo_extobj_add()
>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>> - drm_gpuva_unlink()            (because it needs to call
>>>> drm_gpuvm_bo_put())
>>>> - drm_gpuvm_exec_lock()
>>>> - drm_gpuvm_exec_lock_array()
>>>> - drm_gpuvm_prepare_range()
>>>>
>>>> Given that it seems reasonable to do all the required locking
>>>> internally.
>>>
>>>  From a design POW, there has been a clear direction in XE to make
>>> things similar to mmap() / munmap(), so this outer lock, which in Xe is
>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>> the page-table structures and vma rb tree, the userptr structures and
>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>>> all of the above are just asserting that it is taken in the correct
>>> mode.
>>>
>>> But strictly with this scheme one could also use the vm's dma_resv for
>>> the extobj list since with drm_exec, it's locked before traversing the
>>> list.
>>>
>>> The whole point of this scheme is to rely on locks that you already are
>>> supposed to be holding for various reasons and is simple to comprehend.
>>
>> I don't agree that we're supposed to hold the VM's resv lock anyways for
>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine 
>> using it
>> for that purpose nevertheless.
>>
>>>
>>>>
>>>> In order to at least place lockdep checks, the driver would need to
>>>> supply the
>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>>> know about
>>>> the lock.
>>>
>>> Yes, that sounds reasonable. One lockdep map per list.
>>
>> I'd really like to avoid that, especially now that everything got 
>> simpler. We
>> should define the actual locks to take instead.
>>
>>>
>>>>
>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>>> need to
>>>> spin?
>>>
>>> I guess it's hard to tell exactly, but it is much lower on modern x86
>>> than what it used to be. Not sure about ARM, which is the other
>>> architecture important to us. I figure if there is little cache-line
>>> bouncing the main overhead comes from the implied barriers.
>>>
>>>>
>>>>>
>>>>> A pretty simple way that would not add much code would be
>>>>>
>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>> spinlock_t
>>>>> *lock)
>>>>>
>>>>> {
>>>>>
>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>          spin_lock(lock);
>>>>>
>>>>> }
>>>>>
>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>> hold the vm's
>>>>>>> resv, though.
>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>> gpuva list (or
>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>> lock for that
>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>> otherwise wouldn't
>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>> was referring to
>>>>>> earlier.
>>>>>
>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>> list, but
>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>> problem. We
>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>> but we
>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>> calls to
>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>
>>>> Drivers calling unlink() from the fence signaling path can't use the
>>>> VM's
>>>> dma-resv lock.
>>>
>>> Yes, that made me a bit curious because in the current version the code
>>> required the object's dma_resv for unlink() which can't be grabbed
>>> either from the fence signaling path. So are there any drivers actually
>>> wanting to do that? If so, they will either need to resort to the
>>> current spinlock solution or they will need to call unlink from a
>>> workqueue item.
>>
>> As Boris already mentioned we have the dma-resv lock by default or a 
>> driver
>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>
>>>>
>>>> Also, what if the object is an external object? We can't use the VM's
>>>> dma-resv
>>>> lock here.
>>>
>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>> operation where it should be trivial to grab the vm's resv. Or, for
>>> that matter any outer lock protecting the extobj list. Rule would be
>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>>> be protected by either the vm's dma_resv (or possibly an outer lock in
>>> the case of the extobj list).
>>
>> Outer lock wouldn't have been working for updates in the async path, but
>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>
>>>
>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>> refcount drops
>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>>> drop the
>>>> last reference of the GEM object.
>>>
>>> Yes, but this is a different problem as to what exactly protects
>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
>>> Boris didn't like that, but requiring an explicit refcount for a
>>> pointer you dereference unless you're under a lock that ensures keeping
>>> the object alive is pretty much required?) But anyway for the
>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
>>> I don't have a strong preference.
>>
>> We can keep the GEM objects dma-resv lock, however as mentioned above
>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's 
>> resv lock
>> and the GEM's resv lock in case they differ.
>>
>
>>>>>
>
Boris Brezillon Sept. 14, 2023, 11:54 a.m. UTC | #33
On Thu, 14 Sep 2023 12:45:44 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> On 9/14/23 10:20, Boris Brezillon wrote:
> > On Wed, 13 Sep 2023 15:22:56 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> >> On 9/13/23 13:33, Boris Brezillon wrote:  
> >>> On Wed, 13 Sep 2023 12:39:01 +0200
> >>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>     
> >>>> Hi,
> >>>>
> >>>> On 9/13/23 09:19, Boris Brezillon wrote:  
> >>>>> On Wed, 13 Sep 2023 17:05:42 +1000
> >>>>> Dave Airlie <airlied@gmail.com> wrote:
> >>>>>        
> >>>>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
> >>>>>> <boris.brezillon@collabora.com> wrote:  
> >>>>>>> On Tue, 12 Sep 2023 18:20:32 +0200
> >>>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>>>>>           
> >>>>>>>>> +/**
> >>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
> >>>>>>>>> + * @__gpuvm: The GPU VM
> >>>>>>>>> + * @__list_name: The name of the list we're iterating on
> >>>>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
> >>>>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> >>>>>>>>> + *
> >>>>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
> >>>>>>>>> + * iterator releases the lock immediately after picking the first element from
> >>>>>>>>> + * the list, so list insertion deletion can happen concurrently.  
> >>>>>>>> Are the list spinlocks needed for that async state update from within
> >>>>>>>> the dma-fence critical section we've discussed previously?  
> >>>>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> >>>>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
> >>>>>>> get that Xe and Nouveau don't need that because they update the VM
> >>>>>>> state early (in the ioctl path), but I keep thinking this will hurt us
> >>>>>>> if we don't think it through from the beginning, because once you've
> >>>>>>> set this logic to depend only on resv locks, it will be pretty hard to
> >>>>>>> get back to a solution which lets synchronous VM_BINDs take precedence
> >>>>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
> >>>>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
> >>>>>>> take a long time to get your synchronous VM_BIND executed...  
> >>>> So this would boil down to either (possibly opt-in) keeping the spinlock
> >>>> approach or pushing the unlink out to a wq then?  
> >>> Deferred _unlink() would not be an issue, since I already defer the
> >>> drm_gpuva destruction to a wq, it would just a be a matter of moving the
> >>> _unlink() call there as well. But _link() also takes the GEM gpuva list
> >>> lock, and that one is bit tricky, in that sm_map() can trigger 2 more
> >>> _link() calls for the prev/next mappings, which we can't guess until we
> >>> get to execute the VM update. If we mandate the use of the GEM resv
> >>> lock, that simply means async VM updates (AKA calling
> >>> drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
> >>> agrees on, then I'd like the APIs that make this sort of async VM
> >>> update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
> >>> methods, and probably other things) to be dropped, so we don't make it
> >>> look like it's something we support.
> >>>     
> >>>> BTW, as also asked in a reply to Danilo, how do you call unlink from
> >>>> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?  
> >>> _unlink() makes sure the GEM gpuva list lock is taken, but this can be
> >>> a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
> >>> panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
> >>> protection. We make sure we never take this lock while allocating
> >>> memory to guarantee the dma-signalling path can't deadlock.
> >>>     
> >>>>>>>           
> >>>>>> btw what is the use case for this? do we have actual vulkan
> >>>>>> applications we know will have problems here?  
> >>>>> I don't, but I think that's a concern Faith raised at some point (dates
> >>>>> back from when I was reading threads describing how VM_BIND on i915
> >>>>> should work, and I was clearly discovering this whole VM_BIND thing at
> >>>>> that time, so maybe I misunderstood).
> >>>>>        
> >>>>>> it feels like a bit of premature optimisation, but maybe we have use cases.  
> >>>>> Might be, but that's the sort of thing that would put us in a corner if
> >>>>> we don't have a plan for when the needs arise. Besides, if we don't
> >>>>> want to support that case because it's too complicated, I'd recommend
> >>>>> dropping all the drm_gpuvm APIs that let people think this mode is
> >>>>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
> >>>>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
> >>>>> confusion.  
> >>>> Xe allows bypassing the bind-queue with another bind-queue, but to
> >>>> completely avoid dependencies between queues the Operations may not
> >>>> overlap.  
> >>> So, you check the VM state with some VM lock held (would be the VM resv
> >>> in my case), and if the mapping is new (no overlaps with pre-existing
> >>> mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
> >>> be missing I guess is a way to know if the mapping is active (MMU has
> >>> been updated) or pending (MMU update queued to the bind-queue), so I can
> >>> fast-track mapping/unmapping of active mappings.  
> > Ok, so I started modifying the implementation, and quickly realized the
> > overlap test can't be done without your xe_range_fence tree because of
> > unmaps. Since we call drm_gpuva_unmap() early/in the IOCTL path (IOW,
> > before the mapping teardown is effective), we lose track of this
> > yet-to-be-executed-unmap operation, and if we do our
> > va_range_overlaps_with_existing_mappings() test after such an unmap has
> > been queued using just the drm_gpuvm tree, we might get false even if
> > the mapping still exists and is expected to be torn down when the
> > VM_BIND(unmap) job is executed on the bind-queue. As a result, this
> > might execute the VM_BIND(map,sync) immediately (because the dependency
> > went undetected), and then the vm_bind_run_job() function kicks in and
> > undoes what the synchronous VM_BIND(map) did. Am I missing something?
> >
> > If I'm correct, that means I'm back to having synchronous VM_BIND ops
> > queued after all asynchronous ones unless I use something like your
> > xe_range_fence solution (which I was hoping I could postpone until we
> > decide to expose multiple bind queues).  
> 
> Yes, unfortunately fine-granular async range-tracking comes with a cost. 
> Still, if you are doing page-table updates solely with the CPU, you 
> could probably short-circuit the fence part of the fenced ranges?

I'm doing it with the CPU, but asynchronously (bind-queue), so I'm
facing pretty much the same problems, I think.

> 
> 
> >
> > I'm still a bit skeptical about this 'update VM mappings tree early,
> > defer MMU page table updates' approach, where the VM state and the
> > actual page table tree are temporarily out of sync until all operations
> > have been flushed on all queues targeting a VM. This means any test we
> > do on the gpuvm, like, 'give me the BO mapped at VA xxx', is subject to
> > 'is this the current state or the future state?' questioning. Note that
> > we can't even get the current VM state anymore, because all the
> > drm_gpuvm::tree stores with this solution is the future state, and
> > to-be-unmapped mappings are lost during the transitioning period (when
> > vm_bind jobs are queued but not executed yet).  
> 
> Understandable. But this is the way we historically have been doing 
> things, (I think the whole async atomic page-flipping is using the same 
> concept), but rather than refering to it as current state and future 
> state, I'd like to think it as Synchronous CPU state (What an API user 
> sees) vs GPU state (What the GPU sees where it's currently executing).

Actually, the latency incurred by the fact the page table updates are
done by the GPU is one thing, and I guess I could agree with you if that
was the only difference between the GPU and CPU view. But the fact
VM_BIND jobs can have external dependencies makes things a lot more
confusing. I might be wrong, but I think atomic page-flip is simpler.
Yes you can have implicit deps on your scanout buffer, and yes the HW
will wait for these fences to signal before updating the plane pointer,
but that's still just a simple pipeline with one resource to deal with.
A VM is a whole range with virtual memory regions being attached
physical mem chunks, possibly with each range having its own lifecycle,
etc. It'd make more sense to me to have a way to know the current
state, and the future state.

Just one example, say you have a GPU job that triggers some fault
that's supposed to be handled by the kernel driver to unblock the
situation. In order to have some context, the kernel driver needs to
read a GPU buffer that's passed back as a virtual address by the GPU/FW,
so it calls drm_gpuvm_bo_find(), and now it might potentially get a BO
that's not the current BO being mapped at this address, but the future
BO after some asynchronous VM_BIND(map) has been executed, and of
course, the VM_BIND job leading to this future state, could have a
dependency on the GPU job, because this GPU job was using the old
mapping. It might sound completely hypothetical, but that's actually
the sort of things the Mali FW does in a few occasions.

So yeah, I'm still not convinced we can always get away with just the
future representation of the VM. Sometimes you have to know what's
mapped at the moment.

> To bring them in sync you need to wait for fences.

Wouldn't solve the case I mentioned above, AFAICT.

> And ideally the async 
> work should never fail.

Sure, that I considered for granted. If async VM_BIND fails, we just
flag the VM as unusable, and cancel any GPU job submission happening on
the VM. The user then has to recreate the VM to take a fresh start
(DEVICE_LOST situation).

It a bit tricky when we want to clean things up after a failure,
because we might have lost track of some of mappings (early
gpuva_unmap(), but the MMU page tables are still lying around). In our
case (Panthor) that's not really an issue though, because
free_io_pgtable_ops() will take care of that for us.

> 
> If one wants to push async work out to be handled solely by the GPU, 
> this is the way things must be done since the GPU can't take locks or 
> allocate memory, but as part or all of async work is sometimes done 
> using the CPU, it might make sense to challenge that to some extent. 

I think updating the VM state in the run_job() with drm_gpuva_[un]map()
would still account for the GPU-is-executing-pgtable-updates latency,
and that's not really the sort of desynchronization I'm worried about,
because when you get to submit your VM_BIND job, you know all the job
deps are met, and the VM update is about to happen. What I'm worried
about is the desynchronization incurred by complex VM_BIND job deps
that make it hard to know what's the diff between the drm_gpuvm state
(predicting the future) and the VM state a GPU job expects (the
present).
Thomas Hellstrom Sept. 14, 2023, 1:33 p.m. UTC | #34
Hi,

On 9/14/23 13:54, Boris Brezillon wrote:
> On Thu, 14 Sep 2023 12:45:44 +0200
> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>
>> On 9/14/23 10:20, Boris Brezillon wrote:
>>> On Wed, 13 Sep 2023 15:22:56 +0200
>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>   
>>>> On 9/13/23 13:33, Boris Brezillon wrote:
>>>>> On Wed, 13 Sep 2023 12:39:01 +0200
>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>      
>>>>>> Hi,
>>>>>>
>>>>>> On 9/13/23 09:19, Boris Brezillon wrote:
>>>>>>> On Wed, 13 Sep 2023 17:05:42 +1000
>>>>>>> Dave Airlie <airlied@gmail.com> wrote:
>>>>>>>         
>>>>>>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
>>>>>>>> <boris.brezillon@collabora.com> wrote:
>>>>>>>>> On Tue, 12 Sep 2023 18:20:32 +0200
>>>>>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
>>>>>>>>>            
>>>>>>>>>>> +/**
>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>> + *
>>>>>>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
>>>>>>>>>>> + * iterator releases the lock immediately after picking the first element from
>>>>>>>>>>> + * the list, so list insertion deletion can happen concurrently.
>>>>>>>>>> Are the list spinlocks needed for that async state update from within
>>>>>>>>>> the dma-fence critical section we've discussed previously?
>>>>>>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
>>>>>>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
>>>>>>>>> get that Xe and Nouveau don't need that because they update the VM
>>>>>>>>> state early (in the ioctl path), but I keep thinking this will hurt us
>>>>>>>>> if we don't think it through from the beginning, because once you've
>>>>>>>>> set this logic to depend only on resv locks, it will be pretty hard to
>>>>>>>>> get back to a solution which lets synchronous VM_BINDs take precedence
>>>>>>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
>>>>>>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
>>>>>>>>> take a long time to get your synchronous VM_BIND executed...
>>>>>> So this would boil down to either (possibly opt-in) keeping the spinlock
>>>>>> approach or pushing the unlink out to a wq then?
>>>>> Deferred _unlink() would not be an issue, since I already defer the
>>>>> drm_gpuva destruction to a wq, it would just a be a matter of moving the
>>>>> _unlink() call there as well. But _link() also takes the GEM gpuva list
>>>>> lock, and that one is bit tricky, in that sm_map() can trigger 2 more
>>>>> _link() calls for the prev/next mappings, which we can't guess until we
>>>>> get to execute the VM update. If we mandate the use of the GEM resv
>>>>> lock, that simply means async VM updates (AKA calling
>>>>> drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
>>>>> agrees on, then I'd like the APIs that make this sort of async VM
>>>>> update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
>>>>> methods, and probably other things) to be dropped, so we don't make it
>>>>> look like it's something we support.
>>>>>      
>>>>>> BTW, as also asked in a reply to Danilo, how do you call unlink from
>>>>>> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?
>>>>> _unlink() makes sure the GEM gpuva list lock is taken, but this can be
>>>>> a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
>>>>> panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
>>>>> protection. We make sure we never take this lock while allocating
>>>>> memory to guarantee the dma-signalling path can't deadlock.
>>>>>      
>>>>>>>>>            
>>>>>>>> btw what is the use case for this? do we have actual vulkan
>>>>>>>> applications we know will have problems here?
>>>>>>> I don't, but I think that's a concern Faith raised at some point (dates
>>>>>>> back from when I was reading threads describing how VM_BIND on i915
>>>>>>> should work, and I was clearly discovering this whole VM_BIND thing at
>>>>>>> that time, so maybe I misunderstood).
>>>>>>>         
>>>>>>>> it feels like a bit of premature optimisation, but maybe we have use cases.
>>>>>>> Might be, but that's the sort of thing that would put us in a corner if
>>>>>>> we don't have a plan for when the needs arise. Besides, if we don't
>>>>>>> want to support that case because it's too complicated, I'd recommend
>>>>>>> dropping all the drm_gpuvm APIs that let people think this mode is
>>>>>>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
>>>>>>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
>>>>>>> confusion.
>>>>>> Xe allows bypassing the bind-queue with another bind-queue, but to
>>>>>> completely avoid dependencies between queues the Operations may not
>>>>>> overlap.
>>>>> So, you check the VM state with some VM lock held (would be the VM resv
>>>>> in my case), and if the mapping is new (no overlaps with pre-existing
>>>>> mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
>>>>> be missing I guess is a way to know if the mapping is active (MMU has
>>>>> been updated) or pending (MMU update queued to the bind-queue), so I can
>>>>> fast-track mapping/unmapping of active mappings.
>>> Ok, so I started modifying the implementation, and quickly realized the
>>> overlap test can't be done without your xe_range_fence tree because of
>>> unmaps. Since we call drm_gpuva_unmap() early/in the IOCTL path (IOW,
>>> before the mapping teardown is effective), we lose track of this
>>> yet-to-be-executed-unmap operation, and if we do our
>>> va_range_overlaps_with_existing_mappings() test after such an unmap has
>>> been queued using just the drm_gpuvm tree, we might get false even if
>>> the mapping still exists and is expected to be torn down when the
>>> VM_BIND(unmap) job is executed on the bind-queue. As a result, this
>>> might execute the VM_BIND(map,sync) immediately (because the dependency
>>> went undetected), and then the vm_bind_run_job() function kicks in and
>>> undoes what the synchronous VM_BIND(map) did. Am I missing something?
>>>
>>> If I'm correct, that means I'm back to having synchronous VM_BIND ops
>>> queued after all asynchronous ones unless I use something like your
>>> xe_range_fence solution (which I was hoping I could postpone until we
>>> decide to expose multiple bind queues).
>> Yes, unfortunately fine-granular async range-tracking comes with a cost.
>> Still, if you are doing page-table updates solely with the CPU, you
>> could probably short-circuit the fence part of the fenced ranges?
> I'm doing it with the CPU, but asynchronously (bind-queue), so I'm
> facing pretty much the same problems, I think.
>
>>
>>> I'm still a bit skeptical about this 'update VM mappings tree early,
>>> defer MMU page table updates' approach, where the VM state and the
>>> actual page table tree are temporarily out of sync until all operations
>>> have been flushed on all queues targeting a VM. This means any test we
>>> do on the gpuvm, like, 'give me the BO mapped at VA xxx', is subject to
>>> 'is this the current state or the future state?' questioning. Note that
>>> we can't even get the current VM state anymore, because all the
>>> drm_gpuvm::tree stores with this solution is the future state, and
>>> to-be-unmapped mappings are lost during the transitioning period (when
>>> vm_bind jobs are queued but not executed yet).
>> Understandable. But this is the way we historically have been doing
>> things, (I think the whole async atomic page-flipping is using the same
>> concept), but rather than refering to it as current state and future
>> state, I'd like to think it as Synchronous CPU state (What an API user
>> sees) vs GPU state (What the GPU sees where it's currently executing).
> Actually, the latency incurred by the fact the page table updates are
> done by the GPU is one thing, and I guess I could agree with you if that
> was the only difference between the GPU and CPU view. But the fact
> VM_BIND jobs can have external dependencies makes things a lot more
> confusing. I might be wrong, but I think atomic page-flip is simpler.
> Yes you can have implicit deps on your scanout buffer, and yes the HW
> will wait for these fences to signal before updating the plane pointer,
> but that's still just a simple pipeline with one resource to deal with.
> A VM is a whole range with virtual memory regions being attached
> physical mem chunks, possibly with each range having its own lifecycle,
> etc. It'd make more sense to me to have a way to know the current
> state, and the future state.

Yeah so in Xe we support async bind jobs solely to be able to do deep 
pipelining and it's not only the pagetable jobs, You could have multiple 
bind-evict-restore-exec-unbind-bind-evict-restore-exec all piplelined 
and only the available memory resources sets the limit. In fact you can 
even have physical VRAM assigned to a bo which won't be used until exec 
#5 in the pipeline and released in exec #4 since TTM is aware of async 
memory management.

So something needs to absorb the state discrepancy between what you 
refer to as the current state and the future state. The question is what 
should absorb it? Should it be the gpuvm or some associated driver state 
tracking?

Now let's say that you have a deferred bind state-update pending and 
track the *current* state in the gpuvm so that a number of vma unmaps 
and maps aren't yet visible to gpuvm and then you submit an exec ioctl. 
How does the exec ioctl know the gpuvm state? Like external bos to 
validate or bos that become evicted, userptr vmas that have been 
invalidated? Does the exec need to block waiting for the bind fence to 
complete so that it can assess the VM state that UMD intended to be there?

>
> Just one example, say you have a GPU job that triggers some fault
> that's supposed to be handled by the kernel driver to unblock the
> situation. In order to have some context, the kernel driver needs to
> read a GPU buffer that's passed back as a virtual address by the GPU/FW,
> so it calls drm_gpuvm_bo_find(), and now it might potentially get a BO
> that's not the current BO being mapped at this address, but the future
> BO after some asynchronous VM_BIND(map) has been executed, and of
> course, the VM_BIND job leading to this future state, could have a
> dependency on the GPU job, because this GPU job was using the old
> mapping. It might sound completely hypothetical, but that's actually
> the sort of things the Mali FW does in a few occasions.

Recoverable faults are typically requiring some sort of memory operation 
that requires the dma_resv or outer lock, like validation or 
get_user_pages(), and can thus not be performed in the fence signalling 
critical path and on Xe they are reserved for Long-Running VMs. On 
those, pipelining is not really needed and is disallowed in Xe to avoid 
having to deal with the state discrepancy.

But to the actual problem you mention, let's say its a fault that 
triggers a need to dump bo contents, then yes in order to be able to do 
deep pipelining in this way the driver needs to track some state 
discrepancy, and that's an additional overhead.

>
> So yeah, I'm still not convinced we can always get away with just the
> future representation of the VM. Sometimes you have to know what's
> mapped at the moment.
>
>> To bring them in sync you need to wait for fences.
> Wouldn't solve the case I mentioned above, AFAICT.
>
>> And ideally the async
>> work should never fail.
> Sure, that I considered for granted. If async VM_BIND fails, we just
> flag the VM as unusable, and cancel any GPU job submission happening on
> the VM. The user then has to recreate the VM to take a fresh start
> (DEVICE_LOST situation).
>
> It a bit tricky when we want to clean things up after a failure,
> because we might have lost track of some of mappings (early
> gpuva_unmap(), but the MMU page tables are still lying around). In our
> case (Panthor) that's not really an issue though, because
> free_io_pgtable_ops() will take care of that for us.
>
>> If one wants to push async work out to be handled solely by the GPU,
>> this is the way things must be done since the GPU can't take locks or
>> allocate memory, but as part or all of async work is sometimes done
>> using the CPU, it might make sense to challenge that to some extent.
> I think updating the VM state in the run_job() with drm_gpuva_[un]map()
> would still account for the GPU-is-executing-pgtable-updates latency,
> and that's not really the sort of desynchronization I'm worried about,
> because when you get to submit your VM_BIND job, you know all the job
> deps are met, and the VM update is about to happen. What I'm worried
> about is the desynchronization incurred by complex VM_BIND job deps
> that make it hard to know what's the diff between the drm_gpuvm state
> (predicting the future) and the VM state a GPU job expects (the
> present).

Yes that sort of deep pipeling requires additional "current" state 
tracking for some situations, but waiting in exec for the current state 
to catch up with future state, which it seems is a consequence of async 
state updates, isn't really an option for us.

Now if you think the decision to remove those spinlocks from drm_gpuvm 
was premature, I'm fully OK to have them in there again, but opt-in so 
that we have helpers that fit all purposes.

/Thomas
Thomas Hellstrom Sept. 14, 2023, 1:48 p.m. UTC | #35
Hi, Danilo

Some additional minor comments as xe conversion progresses.

On 9/9/23 17:31, Danilo Krummrich wrote:
> So far the DRM GPUVA manager offers common infrastructure to track GPU VA
> allocations and mappings, generically connect GPU VA mappings to their
> backing buffers and perform more complex mapping operations on the GPU VA
> space.
>
> However, there are more design patterns commonly used by drivers, which
> can potentially be generalized in order to make the DRM GPUVA manager
> represent a basic GPU-VM implementation. In this context, this patch aims
> at generalizing the following elements.
>
> 1) Provide a common dma-resv for GEM objects not being used outside of
>     this GPU-VM.
>
> 2) Provide tracking of external GEM objects (GEM objects which are
>     shared with other GPU-VMs).
>
> 3) Provide functions to efficiently lock all GEM objects dma-resv the
>     GPU-VM contains mappings of.
>
> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
>     of, such that validation of evicted GEM objects is accelerated.
>
> 5) Provide some convinience functions for common patterns.
>
> Rather than being designed as a "framework", the target is to make all
> features appear as a collection of optional helper functions, such that
> drivers are free to make use of the DRM GPUVA managers basic
> functionality and opt-in for other features without setting any feature
> flags, just by making use of the corresponding functions.
>
> Big kudos to Boris Brezillon for his help to figure out locking for drivers
> updating the GPU VA space within the fence signalling path.
>
> Suggested-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>
> +/**
> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a
> + * &drm_gpuvms evicted list
> + * @obj: the &drm_gem_object to add or remove
> + * @evict: indicates whether the object is evicted
> + *
> + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms evicted
> + * list containing a mapping of this &drm_gem_object.
> + */
> +void
> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
> +{
> +	struct drm_gpuvm_bo *vm_bo;
> +
> +	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> +		if (evict)
> +			drm_gpuvm_bo_list_add(vm_bo, evict);
> +		else
> +			drm_gpuvm_bo_list_del(vm_bo, evict);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
> +

We need a drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, ...) that puts 
a single gpuvm_bo on the list, the above function could perhaps be 
renamed as drm_gpuvm_gem_obj_evict(obj, ....).

Reason is some vm's are faulting vms which don't have an evict list, but 
validate from the pagefault handler. Also evict == false is dangerous 
because if called from within an exec, it might remove the obj from 
other vm's evict list before they've had a chance to rebind their VMAs.

>   static int
>   __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>   		   struct drm_gpuva *va)
> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> index afa50b9059a2..834bb6d6617e 100644
> --- a/include/drm/drm_gpuvm.h
> +++ b/include/drm/drm_gpuvm.h
> @@ -26,10 +26,12 @@
>    */
>   
>   #include <linux/list.h>
> +#include <linux/dma-resv.h>
>   #include <linux/rbtree.h>
>   #include <linux/types.h>
>   
>   #include <drm/drm_gem.h>
> +#include <drm/drm_exec.h>
>   
>   struct drm_gpuvm;
>   struct drm_gpuvm_bo;
> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>   	 * space
>   	 */
>   	struct dma_resv *resv;
> +
> +	/**
> +	 * @extobj: structure holding the extobj list
> +	 */
> +	struct {
> +		/**
> +		 * @list: &list_head storing &drm_gpuvm_bos serving as
> +		 * external object
> +		 */
> +		struct list_head list;
> +
> +		/**
> +		 * @lock: spinlock to protect the extobj list
> +		 */
> +		spinlock_t lock;
> +	} extobj;
> +
> +	/**
> +	 * @evict: structure holding the evict list and evict list lock
> +	 */
> +	struct {
> +		/**
> +		 * @list: &list_head storing &drm_gpuvm_bos currently being
> +		 * evicted
> +		 */
> +		struct list_head list;
> +
> +		/**
> +		 * @lock: spinlock to protect the evict list
> +		 */
> +		spinlock_t lock;
> +	} evict;
>   };
>   
>   void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
>   		    const struct drm_gpuvm_ops *ops);
>   void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>   
> +/**
> + * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
> + * external object
> + * @gpuvm: the &drm_gpuvm to check
> + * @obj: the &drm_gem_object to check
> + *
> + * Returns: true if the &drm_gem_object &dma_resv differs from the
> + * &drm_gpuvms &dma_resv, false otherwise
> + */
> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
> +				       struct drm_gem_object *obj)
> +{
> +	return obj && obj->resv != gpuvm->resv;
> +}
> +
>   static inline struct drm_gpuva *
>   __drm_gpuva_next(struct drm_gpuva *va)
>   {
> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>   #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
>   	list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)
>   
> +/**
> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
> + *
> + * This structure should be created on the stack as &drm_exec should be.
> + *
> + * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
> + */
> +struct drm_gpuvm_exec {
> +	/**
> +	 * @exec: the &drm_exec structure
> +	 */
> +	struct drm_exec exec;
> +
> +	/**
> +	 * @vm: the &drm_gpuvm to lock its DMA reservations
> +	 */
> +	struct drm_gpuvm *vm;
> +
> +	/**
> +	 * @extra: Callback and corresponding private data for the driver to
> +	 * lock arbitrary additional &drm_gem_objects.
> +	 */
> +	struct {
> +		/**
> +		 * @fn: The driver callback to lock additional &drm_gem_objects.
> +		 */
> +		int (*fn)(struct drm_gpuvm_exec *vm_exec,
> +			  unsigned int num_fences);
> +
> +		/**
> +		 * @priv: driver private data for the @fn callback
> +		 */
> +		void *priv;
> +	} extra;
> +};
> +
> +/**
> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
> + * @gpuvm: the &drm_gpuvm
> + * @exec: the &drm_exec context
> + * @num_fences: the amount of &dma_fences to reserve
> + *
> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
> + *
> + * Using this function directly, it is the drivers responsibility to call
> + * drm_exec_init() and drm_exec_fini() accordingly.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static inline int
> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
> +		     struct drm_exec *exec,
> +		     unsigned int num_fences)
> +{
> +	return drm_exec_prepare_obj(exec, &gpuvm->d_obj, num_fences);
> +}
> +
> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> +			      struct drm_exec *exec,
> +			      unsigned int num_fences);
> +
> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
> +			    struct drm_exec *exec,
> +			    u64 addr, u64 range,
> +			    unsigned int num_fences);
> +
> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> +			unsigned int num_fences,
> +			bool interruptible);
> +
> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> +			      struct drm_gem_object **objs,
> +			      unsigned int num_objs,
> +			      unsigned int num_fences,
> +			      bool interruptible);
> +
> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> +			      u64 addr, u64 range,
> +			      unsigned int num_fences,
> +			      bool interruptible);
> +
> +/**
> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
> + * @gpuvm: the &drm_gpuvm
> + *
> + * Releases all dma-resv locks of all &drm_gem_objects previously acquired
> + * through drm_gpuvm_lock() or its variants.
> + *
> + * Returns: 0 on success, negative error code on failure.
> + */
> +static inline void
> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> +{
> +	drm_exec_fini(&vm_exec->exec);
> +}
> +
> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> +			      struct drm_exec *exec,
> +			      struct dma_fence *fence,
> +			      enum dma_resv_usage private_usage,
> +			      enum dma_resv_usage extobj_usage);
> +
> +/**
> + * drm_gpuvm_exec_resv_add_fence()
> + * @vm_exec: the &drm_gpuvm_exec abstraction
> + * @fence: fence to add
> + * @private_usage: private dma-resv usage
> + * @extobj_usage: extobj dma-resv usage
> + *
> + * See drm_gpuvm_resv_add_fence().
> + */
> +static inline void
> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
> +			      struct dma_fence *fence,
> +			      enum dma_resv_usage private_usage,
> +			      enum dma_resv_usage extobj_usage)
> +{
> +	drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
> +				 private_usage, extobj_usage);
> +}
> +
>   /**
>    * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
>    * &drm_gem_object combination
> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>   			 * gpuva list.
>   			 */
>   			struct list_head gem;
> +
> +			/**
> +			 * @evict: List entry to attach to the &drm_gpuvms
> +			 * extobj list.
> +			 */
> +			struct list_head extobj;
> +
> +			/**
> +			 * @evict: List entry to attach to the &drm_gpuvms evict
> +			 * list.
> +			 */
> +			struct list_head evict;
>   		} entry;
>   	} list;
>   };
> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>   drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>   		  struct drm_gem_object *obj);
>   
> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
> +
>   /**
>    * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
>    * @va__: &drm_gpuva structure to assign to in each iteration step
> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>   	 * used.
>   	 */
>   	int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
> +
> +	/**
> +	 * @bo_validate: called from drm_gpuvm_validate()
> +	 *
> +	 * Drivers receive this callback for every evicted &drm_gem_object being
> +	 * mapped in the corresponding &drm_gpuvm.
> +	 *
> +	 * Typically, drivers would call their driver specific variant of
> +	 * ttm_bo_validate() from within this callback.
> +	 */
> +	int (*bo_validate)(struct drm_gem_object *obj);

Same here. Could we have a vm_bo as an argument instead, so that the 
callback knows what gpuvm we're targeting and can mark all its gpu_vas 
for revalidation? Or is that intended to be done elsewhere?

>   };
>   
>   int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,

Thanks,

Thomas
Danilo Krummrich Sept. 14, 2023, 3:27 p.m. UTC | #36
On 9/14/23 13:32, Thomas Hellström wrote:
> 
> On 9/14/23 12:57, Danilo Krummrich wrote:
>> On 9/13/23 14:16, Danilo Krummrich wrote:
>>
>> <snip>
>>
>>>>> And validate() can remove it while still holding all dma-resv locks,
>>>>> neat!
>>>>> However, what if two tasks are trying to lock the VA space
>>>>> concurrently? What
>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>> drm_gpuva_unlink()?
>>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>>>> on the
>>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>>>> with the
>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>> drm_gpuvm_bo_destroy()
>>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>>> potentially
>>>>> free the dma-resv lock while holding it, at least if it's an external
>>>>> object.
>>>>
>>>> Easiest way in this scheme is to think of the lists as being protected
>>>> by the vm's resv lock. That means anybody calling unlink() must also
>>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>>> perhaps not from a locking inversion POW from an async list update).
>>>
>>> This would mean that on unlink() we'd need to hold the VM's resv lock and the
>>> corresponding GEM's resv lock (in case they're not the same anyways) because the
>>> VM's resv lock would protect the external / evicted object lists and the GEM
>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>>> drm_gpuvm_bo's list of drm_gpuvas.
>>
>> As mentioned below the same applies for drm_gpuvm_bo_put() since it might
>> destroy the vm_bo, which includes removing the vm_bo from external / evicted
>> object lists and the GEMs list of vm_bos.
>>
>> As mentioned, if the GEM's dma-resv is different from the VM's dma-resv we need
>> to take both locks. Ultimately, this would mean we need a drm_exec loop, because
>> we can't know the order in which to take these locks. Doing a full drm_exec loop
>> just to put() a vm_bo doesn't sound reasonable to me.
>>
>> Can we instead just have an internal mutex for locking the lists such that we
>> avoid taking and dropping the spinlocks, which we use currently, in a loop?
> 
> You'd have the same locking inversion problem with a mutex, right? Since in the eviction path you have resv->mutex, from exec you have resv->mutex->resv because validate would attempt to grab resv.

Both lists, evict and extobj, would need to have a separate mutex, not a common one.
We'd also need a dedicated GEM gpuva lock. Then the only rule would be that you can't
hold the dma-resv lock when calling put(). Which I admit is not that nice.

With the current spinlock solution drivers wouldn't need to worry about anything locking
related though. So maybe I come back to your proposal of having a switch for external
locking with dma-resv locks entirely. Such that with external dma-resv locking I skip
all the spinlocks and add lockdep checks instead.

I think that makes the most sense in terms of taking advantage of external dma-resv locking
where possible and on the other hand having a self-contained solution if not. This should
get all concerns out of the way, yours, Christian's and Boris'.

> 
> That said, xe currently indeed does the vm+bo exec dance on vma put.
> 
> One reason why that seemingly horrible construct is good, is that when evicting an extobj and you need to access individual vmas to Zap page table entries or TLB flush, those VMAs are not allowed to go away (we're not refcounting them). Holding the bo resv on gpuva put prevents that from happening. Possibly one could use another mutex to protect the gem->vm_bo list to achieve the same, but we'd need to hold it on gpuva put.
> 
> /Thomas
> 
> 
>>
>> - Danilo
>>
>>>
>>>>
>>>>>
>>>>>>>
>>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>>> really would not
>>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>>> the way in case
>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>
>>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>>> pretty
>>>>>> costly and as discussed earlier this type of locking was the reason
>>>>>> (at
>>>>>> least according to the commit message) that made Christian drop the
>>>>>> XArray
>>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>>> is
>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>> complexity and a
>>>>>> single wide lock following the drm locking guidelines set out by
>>>>>> Daniel and
>>>>>> David should really be the default choice with an opt-in for a
>>>>>> spinlock if
>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>
>>>>> For the external object list an outer lock would work as long as it's
>>>>> not the
>>>>> dma-resv lock of the corresponding GEM object, since here we actually
>>>>> need to
>>>>> remove the list entry from the external object list on
>>>>> drm_gpuvm_bo_destroy().
>>>>> It's just a bit weird design wise that drivers would need to take
>>>>> this outer
>>>>> lock on:
>>>>>
>>>>> - drm_gpuvm_bo_extobj_add()
>>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>> drm_gpuvm_bo_put())
>>>>> - drm_gpuvm_exec_lock()
>>>>> - drm_gpuvm_exec_lock_array()
>>>>> - drm_gpuvm_prepare_range()
>>>>>
>>>>> Given that it seems reasonable to do all the required locking
>>>>> internally.
>>>>
>>>>  From a design POW, there has been a clear direction in XE to make
>>>> things similar to mmap() / munmap(), so this outer lock, which in Xe is
>>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>>> the page-table structures and vma rb tree, the userptr structures and
>>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>>>> all of the above are just asserting that it is taken in the correct
>>>> mode.
>>>>
>>>> But strictly with this scheme one could also use the vm's dma_resv for
>>>> the extobj list since with drm_exec, it's locked before traversing the
>>>> list.
>>>>
>>>> The whole point of this scheme is to rely on locks that you already are
>>>> supposed to be holding for various reasons and is simple to comprehend.
>>>
>>> I don't agree that we're supposed to hold the VM's resv lock anyways for
>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine using it
>>> for that purpose nevertheless.
>>>
>>>>
>>>>>
>>>>> In order to at least place lockdep checks, the driver would need to
>>>>> supply the
>>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>>>> know about
>>>>> the lock.
>>>>
>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>
>>> I'd really like to avoid that, especially now that everything got simpler. We
>>> should define the actual locks to take instead.
>>>
>>>>
>>>>>
>>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>>>> need to
>>>>> spin?
>>>>
>>>> I guess it's hard to tell exactly, but it is much lower on modern x86
>>>> than what it used to be. Not sure about ARM, which is the other
>>>> architecture important to us. I figure if there is little cache-line
>>>> bouncing the main overhead comes from the implied barriers.
>>>>
>>>>>
>>>>>>
>>>>>> A pretty simple way that would not add much code would be
>>>>>>
>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>>> spinlock_t
>>>>>> *lock)
>>>>>>
>>>>>> {
>>>>>>
>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>          spin_lock(lock);
>>>>>>
>>>>>> }
>>>>>>
>>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>>> hold the vm's
>>>>>>>> resv, though.
>>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>>> gpuva list (or
>>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>>> lock for that
>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>> otherwise wouldn't
>>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>>> was referring to
>>>>>>> earlier.
>>>>>>
>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>>> list, but
>>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>>> problem. We
>>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>>> but we
>>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>>> calls to
>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>
>>>>> Drivers calling unlink() from the fence signaling path can't use the
>>>>> VM's
>>>>> dma-resv lock.
>>>>
>>>> Yes, that made me a bit curious because in the current version the code
>>>> required the object's dma_resv for unlink() which can't be grabbed
>>>> either from the fence signaling path. So are there any drivers actually
>>>> wanting to do that? If so, they will either need to resort to the
>>>> current spinlock solution or they will need to call unlink from a
>>>> workqueue item.
>>>
>>> As Boris already mentioned we have the dma-resv lock by default or a driver
>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>>
>>>>>
>>>>> Also, what if the object is an external object? We can't use the VM's
>>>>> dma-resv
>>>>> lock here.
>>>>
>>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>>> operation where it should be trivial to grab the vm's resv. Or, for
>>>> that matter any outer lock protecting the extobj list. Rule would be
>>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>>>> be protected by either the vm's dma_resv (or possibly an outer lock in
>>>> the case of the extobj list).
>>>
>>> Outer lock wouldn't have been working for updates in the async path, but
>>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>>
>>>>
>>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>> refcount drops
>>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>>>> drop the
>>>>> last reference of the GEM object.
>>>>
>>>> Yes, but this is a different problem as to what exactly protects
>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
>>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>> pointer you dereference unless you're under a lock that ensures keeping
>>>> the object alive is pretty much required?) But anyway for the
>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
>>>> I don't have a strong preference.
>>>
>>> We can keep the GEM objects dma-resv lock, however as mentioned above
>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's resv lock
>>> and the GEM's resv lock in case they differ.
>>>
>>
>>>>>>
>>
>
Boris Brezillon Sept. 14, 2023, 3:37 p.m. UTC | #37
On Thu, 14 Sep 2023 15:33:50 +0200
Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:

> Hi,
> 
> On 9/14/23 13:54, Boris Brezillon wrote:
> > On Thu, 14 Sep 2023 12:45:44 +0200
> > Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >  
> >> On 9/14/23 10:20, Boris Brezillon wrote:  
> >>> On Wed, 13 Sep 2023 15:22:56 +0200
> >>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>     
> >>>> On 9/13/23 13:33, Boris Brezillon wrote:  
> >>>>> On Wed, 13 Sep 2023 12:39:01 +0200
> >>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>>>        
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 9/13/23 09:19, Boris Brezillon wrote:  
> >>>>>>> On Wed, 13 Sep 2023 17:05:42 +1000
> >>>>>>> Dave Airlie <airlied@gmail.com> wrote:
> >>>>>>>           
> >>>>>>>> On Wed, 13 Sept 2023 at 17:03, Boris Brezillon
> >>>>>>>> <boris.brezillon@collabora.com> wrote:  
> >>>>>>>>> On Tue, 12 Sep 2023 18:20:32 +0200
> >>>>>>>>> Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote:
> >>>>>>>>>              
> >>>>>>>>>>> +/**
> >>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
> >>>>>>>>>>> + * @__gpuvm: The GPU VM
> >>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
> >>>>>>>>>>> + * @__local_list: A pointer to the local list used to store already iterated items
> >>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
> >>>>>>>>>>> + *
> >>>>>>>>>>> + * This helper is here to provide lockless list iteration. Lockless as in, the
> >>>>>>>>>>> + * iterator releases the lock immediately after picking the first element from
> >>>>>>>>>>> + * the list, so list insertion deletion can happen concurrently.  
> >>>>>>>>>> Are the list spinlocks needed for that async state update from within
> >>>>>>>>>> the dma-fence critical section we've discussed previously?  
> >>>>>>>>> Any driver calling _[un]link() from its drm_gpu_scheduler::run_job()
> >>>>>>>>> hook will be in this situation (Panthor at the moment, PowerVR soon). I
> >>>>>>>>> get that Xe and Nouveau don't need that because they update the VM
> >>>>>>>>> state early (in the ioctl path), but I keep thinking this will hurt us
> >>>>>>>>> if we don't think it through from the beginning, because once you've
> >>>>>>>>> set this logic to depend only on resv locks, it will be pretty hard to
> >>>>>>>>> get back to a solution which lets synchronous VM_BINDs take precedence
> >>>>>>>>> on asynchronous request, and, with vkQueueBindSparse() passing external
> >>>>>>>>> deps (plus the fact the VM_BIND queue might be pretty deep), it can
> >>>>>>>>> take a long time to get your synchronous VM_BIND executed...  
> >>>>>> So this would boil down to either (possibly opt-in) keeping the spinlock
> >>>>>> approach or pushing the unlink out to a wq then?  
> >>>>> Deferred _unlink() would not be an issue, since I already defer the
> >>>>> drm_gpuva destruction to a wq, it would just a be a matter of moving the
> >>>>> _unlink() call there as well. But _link() also takes the GEM gpuva list
> >>>>> lock, and that one is bit tricky, in that sm_map() can trigger 2 more
> >>>>> _link() calls for the prev/next mappings, which we can't guess until we
> >>>>> get to execute the VM update. If we mandate the use of the GEM resv
> >>>>> lock, that simply means async VM updates (AKA calling
> >>>>> drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone
> >>>>> agrees on, then I'd like the APIs that make this sort of async VM
> >>>>> update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step*
> >>>>> methods, and probably other things) to be dropped, so we don't make it
> >>>>> look like it's something we support.
> >>>>>        
> >>>>>> BTW, as also asked in a reply to Danilo, how do you call unlink from
> >>>>>> run_job() when it was requiring the obj->dma_resv lock, or was that a WIP?  
> >>>>> _unlink() makes sure the GEM gpuva list lock is taken, but this can be
> >>>>> a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have
> >>>>> panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list
> >>>>> protection. We make sure we never take this lock while allocating
> >>>>> memory to guarantee the dma-signalling path can't deadlock.
> >>>>>        
> >>>>>>>>>              
> >>>>>>>> btw what is the use case for this? do we have actual vulkan
> >>>>>>>> applications we know will have problems here?  
> >>>>>>> I don't, but I think that's a concern Faith raised at some point (dates
> >>>>>>> back from when I was reading threads describing how VM_BIND on i915
> >>>>>>> should work, and I was clearly discovering this whole VM_BIND thing at
> >>>>>>> that time, so maybe I misunderstood).
> >>>>>>>           
> >>>>>>>> it feels like a bit of premature optimisation, but maybe we have use cases.  
> >>>>>>> Might be, but that's the sort of thing that would put us in a corner if
> >>>>>>> we don't have a plan for when the needs arise. Besides, if we don't
> >>>>>>> want to support that case because it's too complicated, I'd recommend
> >>>>>>> dropping all the drm_gpuvm APIs that let people think this mode is
> >>>>>>> valid/supported (map/remap/unmap hooks in drm_gpuvm_ops,
> >>>>>>> drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the
> >>>>>>> confusion.  
> >>>>>> Xe allows bypassing the bind-queue with another bind-queue, but to
> >>>>>> completely avoid dependencies between queues the Operations may not
> >>>>>> overlap.  
> >>>>> So, you check the VM state with some VM lock held (would be the VM resv
> >>>>> in my case), and if the mapping is new (no overlaps with pre-existing
> >>>>> mappings), you queue it to the fast-track/sync-VM_BIND queue. What would
> >>>>> be missing I guess is a way to know if the mapping is active (MMU has
> >>>>> been updated) or pending (MMU update queued to the bind-queue), so I can
> >>>>> fast-track mapping/unmapping of active mappings.  
> >>> Ok, so I started modifying the implementation, and quickly realized the
> >>> overlap test can't be done without your xe_range_fence tree because of
> >>> unmaps. Since we call drm_gpuva_unmap() early/in the IOCTL path (IOW,
> >>> before the mapping teardown is effective), we lose track of this
> >>> yet-to-be-executed-unmap operation, and if we do our
> >>> va_range_overlaps_with_existing_mappings() test after such an unmap has
> >>> been queued using just the drm_gpuvm tree, we might get false even if
> >>> the mapping still exists and is expected to be torn down when the
> >>> VM_BIND(unmap) job is executed on the bind-queue. As a result, this
> >>> might execute the VM_BIND(map,sync) immediately (because the dependency
> >>> went undetected), and then the vm_bind_run_job() function kicks in and
> >>> undoes what the synchronous VM_BIND(map) did. Am I missing something?
> >>>
> >>> If I'm correct, that means I'm back to having synchronous VM_BIND ops
> >>> queued after all asynchronous ones unless I use something like your
> >>> xe_range_fence solution (which I was hoping I could postpone until we
> >>> decide to expose multiple bind queues).  
> >> Yes, unfortunately fine-granular async range-tracking comes with a cost.
> >> Still, if you are doing page-table updates solely with the CPU, you
> >> could probably short-circuit the fence part of the fenced ranges?  
> > I'm doing it with the CPU, but asynchronously (bind-queue), so I'm
> > facing pretty much the same problems, I think.
> >  
> >>  
> >>> I'm still a bit skeptical about this 'update VM mappings tree early,
> >>> defer MMU page table updates' approach, where the VM state and the
> >>> actual page table tree are temporarily out of sync until all operations
> >>> have been flushed on all queues targeting a VM. This means any test we
> >>> do on the gpuvm, like, 'give me the BO mapped at VA xxx', is subject to
> >>> 'is this the current state or the future state?' questioning. Note that
> >>> we can't even get the current VM state anymore, because all the
> >>> drm_gpuvm::tree stores with this solution is the future state, and
> >>> to-be-unmapped mappings are lost during the transitioning period (when
> >>> vm_bind jobs are queued but not executed yet).  
> >> Understandable. But this is the way we historically have been doing
> >> things, (I think the whole async atomic page-flipping is using the same
> >> concept), but rather than refering to it as current state and future
> >> state, I'd like to think it as Synchronous CPU state (What an API user
> >> sees) vs GPU state (What the GPU sees where it's currently executing).  
> > Actually, the latency incurred by the fact the page table updates are
> > done by the GPU is one thing, and I guess I could agree with you if that
> > was the only difference between the GPU and CPU view. But the fact
> > VM_BIND jobs can have external dependencies makes things a lot more
> > confusing. I might be wrong, but I think atomic page-flip is simpler.
> > Yes you can have implicit deps on your scanout buffer, and yes the HW
> > will wait for these fences to signal before updating the plane pointer,
> > but that's still just a simple pipeline with one resource to deal with.
> > A VM is a whole range with virtual memory regions being attached
> > physical mem chunks, possibly with each range having its own lifecycle,
> > etc. It'd make more sense to me to have a way to know the current
> > state, and the future state.  
> 
> Yeah so in Xe we support async bind jobs solely to be able to do deep 
> pipelining and it's not only the pagetable jobs, You could have multiple 
> bind-evict-restore-exec-unbind-bind-evict-restore-exec all piplelined 
> and only the available memory resources sets the limit. In fact you can 
> even have physical VRAM assigned to a bo which won't be used until exec 
> #5 in the pipeline and released in exec #4 since TTM is aware of async 
> memory management.
> 
> So something needs to absorb the state discrepancy between what you 
> refer to as the current state and the future state. The question is what 
> should absorb it? Should it be the gpuvm or some associated driver state 
> tracking?

That's exactly what I'd like to sort out.

> 
> Now let's say that you have a deferred bind state-update pending and 
> track the *current* state in the gpuvm so that a number of vma unmaps 
> and maps aren't yet visible to gpuvm and then you submit an exec ioctl. 
> How does the exec ioctl know the gpuvm state?

A tree of pending VM ops, ordered per VA-range, with overlapping
allowed (to support pipelining), who are assigned fence objects? With
the fence + explicit deps passed to a job, we should know which extra
extobjs to add to the currently mapped extobjs. But I wasn't even
considering something as complex. Extobjs can be added early, even
before the mapping is active (that's what I was doing in my previous
PoC, using the async VM update model). Same goes for evicted BOs, they
can be re-pinned early. The only downside is that we might force BO
residency to last longer than strictly needed (because we'd be adding
our GPU job fence to extobjs we don't necessarily need if these extobjs
end up being mapped after the GPU job is executed, which can happen if
the deps passed to VM_BIND prevent its execution).

> Like external bos to 
> validate or bos that become evicted, userptr vmas that have been 
> invalidated? Does the exec need to block waiting for the bind fence to 
> complete so that it can assess the VM state that UMD intended to be there?

I'd say no, given the GPU job added its fence to the VM resv and all
extobjs resvs. If a mapping update is queued, it should be waiting for
the job using the previous mapping to complete, thus making BO
retrieval from an exception path okay (and when I say exception path, I
intentionally exclude any allocation requests, because those would need
extra precautions, like non-blocking allocation, and I don't even want
to think about it at the moment).

> 
> >
> > Just one example, say you have a GPU job that triggers some fault
> > that's supposed to be handled by the kernel driver to unblock the
> > situation. In order to have some context, the kernel driver needs to
> > read a GPU buffer that's passed back as a virtual address by the GPU/FW,
> > so it calls drm_gpuvm_bo_find(), and now it might potentially get a BO
> > that's not the current BO being mapped at this address, but the future
> > BO after some asynchronous VM_BIND(map) has been executed, and of
> > course, the VM_BIND job leading to this future state, could have a
> > dependency on the GPU job, because this GPU job was using the old
> > mapping. It might sound completely hypothetical, but that's actually
> > the sort of things the Mali FW does in a few occasions.  
> 
> Recoverable faults are typically requiring some sort of memory operation 
> that requires the dma_resv or outer lock, like validation or 
> get_user_pages(), and can thus not be performed in the fence signalling 
> critical path and on Xe they are reserved for Long-Running VMs. On 
> those, pipelining is not really needed and is disallowed in Xe to avoid 
> having to deal with the state discrepancy.

I intentionally didn't take the map/alloc-on-fault example, because
that one is bit complicated, and we don't need it (at least not yet).

> 
> But to the actual problem you mention, let's say its a fault that 
> triggers a need to dump bo contents, then yes in order to be able to do 
> deep pipelining in this way the driver needs to track some state 
> discrepancy, and that's an additional overhead.

Yes, more something like that.

> 
> >
> > So yeah, I'm still not convinced we can always get away with just the
> > future representation of the VM. Sometimes you have to know what's
> > mapped at the moment.
> >  
> >> To bring them in sync you need to wait for fences.  
> > Wouldn't solve the case I mentioned above, AFAICT.
> >  
> >> And ideally the async
> >> work should never fail.  
> > Sure, that I considered for granted. If async VM_BIND fails, we just
> > flag the VM as unusable, and cancel any GPU job submission happening on
> > the VM. The user then has to recreate the VM to take a fresh start
> > (DEVICE_LOST situation).
> >
> > It a bit tricky when we want to clean things up after a failure,
> > because we might have lost track of some of mappings (early
> > gpuva_unmap(), but the MMU page tables are still lying around). In our
> > case (Panthor) that's not really an issue though, because
> > free_io_pgtable_ops() will take care of that for us.
> >  
> >> If one wants to push async work out to be handled solely by the GPU,
> >> this is the way things must be done since the GPU can't take locks or
> >> allocate memory, but as part or all of async work is sometimes done
> >> using the CPU, it might make sense to challenge that to some extent.  
> > I think updating the VM state in the run_job() with drm_gpuva_[un]map()
> > would still account for the GPU-is-executing-pgtable-updates latency,
> > and that's not really the sort of desynchronization I'm worried about,
> > because when you get to submit your VM_BIND job, you know all the job
> > deps are met, and the VM update is about to happen. What I'm worried
> > about is the desynchronization incurred by complex VM_BIND job deps
> > that make it hard to know what's the diff between the drm_gpuvm state
> > (predicting the future) and the VM state a GPU job expects (the
> > present).  
> 
> Yes that sort of deep pipeling requires additional "current" state 
> tracking for some situations, but waiting in exec for the current state 
> to catch up with future state, which it seems is a consequence of async 
> state updates, isn't really an option for us.

I wasn't really considering waits as an option, and I do intend to
pipeline VM_BIND and GPU jobs, with deps taking care of further
ordering constraints. What I had in mind was more a way to retrieve the
future state from the current state + a list of diffs. Actually I don't
even mind doing it the other way around (retrieving the current state
from the future state plus a list of pending operations reverted),
because such exceptions are rare enough that we can accept the extra
cost. My point was, having just the future state doesn't always work,
and there's currently no way we can have a list of diffs to revert,
because we lose track of some operations, like unmaps.

> 
> Now if you think the decision to remove those spinlocks from drm_gpuvm 
> was premature, I'm fully OK to have them in there again, but opt-in so 
> that we have helpers that fit all purposes.

Well, with the dual-mode APIs, at least the driver could decide what
the drm_gpuvm state was encoding (current if you call
drm_gpuva_[un]map() from the run_job() path, or future if you do it
from the the IOCTL/submit path). With the new model, that's no longer
an option. But even the new model could work out, if I have way to get
the current state from the future state, I was just hoping we could
make this logic generic...
Danilo Krummrich Sept. 14, 2023, 4:36 p.m. UTC | #38
On 9/14/23 15:48, Thomas Hellström wrote:
> Hi, Danilo
> 
> Some additional minor comments as xe conversion progresses.
> 
> On 9/9/23 17:31, Danilo Krummrich wrote:
>> So far the DRM GPUVA manager offers common infrastructure to track GPU VA
>> allocations and mappings, generically connect GPU VA mappings to their
>> backing buffers and perform more complex mapping operations on the GPU VA
>> space.
>>
>> However, there are more design patterns commonly used by drivers, which
>> can potentially be generalized in order to make the DRM GPUVA manager
>> represent a basic GPU-VM implementation. In this context, this patch aims
>> at generalizing the following elements.
>>
>> 1) Provide a common dma-resv for GEM objects not being used outside of
>>     this GPU-VM.
>>
>> 2) Provide tracking of external GEM objects (GEM objects which are
>>     shared with other GPU-VMs).
>>
>> 3) Provide functions to efficiently lock all GEM objects dma-resv the
>>     GPU-VM contains mappings of.
>>
>> 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
>>     of, such that validation of evicted GEM objects is accelerated.
>>
>> 5) Provide some convinience functions for common patterns.
>>
>> Rather than being designed as a "framework", the target is to make all
>> features appear as a collection of optional helper functions, such that
>> drivers are free to make use of the DRM GPUVA managers basic
>> functionality and opt-in for other features without setting any feature
>> flags, just by making use of the corresponding functions.
>>
>> Big kudos to Boris Brezillon for his help to figure out locking for drivers
>> updating the GPU VA space within the fence signalling path.
>>
>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>> ---
>>
>> +/**
>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a
>> + * &drm_gpuvms evicted list
>> + * @obj: the &drm_gem_object to add or remove
>> + * @evict: indicates whether the object is evicted
>> + *
>> + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms evicted
>> + * list containing a mapping of this &drm_gem_object.
>> + */
>> +void
>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>> +{
>> +    struct drm_gpuvm_bo *vm_bo;
>> +
>> +    drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>> +        if (evict)
>> +            drm_gpuvm_bo_list_add(vm_bo, evict);
>> +        else
>> +            drm_gpuvm_bo_list_del(vm_bo, evict);
>> +    }
>> +}
>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>> +
> 
> We need a drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, ...) that puts a single gpuvm_bo on the list, the above function could perhaps be renamed as drm_gpuvm_gem_obj_evict(obj, ....).

Makes sense - gonna change that.

> 
> Reason is some vm's are faulting vms which don't have an evict list, but validate from the pagefault handler. Also evict == false is dangerous because if called from within an exec, it might remove the obj from other vm's evict list before they've had a chance to rebind their VMAs.
> 
>>   static int
>>   __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>              struct drm_gpuva *va)
>> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
>> index afa50b9059a2..834bb6d6617e 100644
>> --- a/include/drm/drm_gpuvm.h
>> +++ b/include/drm/drm_gpuvm.h
>> @@ -26,10 +26,12 @@
>>    */
>>   #include <linux/list.h>
>> +#include <linux/dma-resv.h>
>>   #include <linux/rbtree.h>
>>   #include <linux/types.h>
>>   #include <drm/drm_gem.h>
>> +#include <drm/drm_exec.h>
>>   struct drm_gpuvm;
>>   struct drm_gpuvm_bo;
>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>        * space
>>        */
>>       struct dma_resv *resv;
>> +
>> +    /**
>> +     * @extobj: structure holding the extobj list
>> +     */
>> +    struct {
>> +        /**
>> +         * @list: &list_head storing &drm_gpuvm_bos serving as
>> +         * external object
>> +         */
>> +        struct list_head list;
>> +
>> +        /**
>> +         * @lock: spinlock to protect the extobj list
>> +         */
>> +        spinlock_t lock;
>> +    } extobj;
>> +
>> +    /**
>> +     * @evict: structure holding the evict list and evict list lock
>> +     */
>> +    struct {
>> +        /**
>> +         * @list: &list_head storing &drm_gpuvm_bos currently being
>> +         * evicted
>> +         */
>> +        struct list_head list;
>> +
>> +        /**
>> +         * @lock: spinlock to protect the evict list
>> +         */
>> +        spinlock_t lock;
>> +    } evict;
>>   };
>>   void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
>>               const struct drm_gpuvm_ops *ops);
>>   void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>> +/**
>> + * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
>> + * external object
>> + * @gpuvm: the &drm_gpuvm to check
>> + * @obj: the &drm_gem_object to check
>> + *
>> + * Returns: true if the &drm_gem_object &dma_resv differs from the
>> + * &drm_gpuvms &dma_resv, false otherwise
>> + */
>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
>> +                       struct drm_gem_object *obj)
>> +{
>> +    return obj && obj->resv != gpuvm->resv;
>> +}
>> +
>>   static inline struct drm_gpuva *
>>   __drm_gpuva_next(struct drm_gpuva *va)
>>   {
>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>   #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
>>       list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)
>> +/**
>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
>> + *
>> + * This structure should be created on the stack as &drm_exec should be.
>> + *
>> + * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
>> + */
>> +struct drm_gpuvm_exec {
>> +    /**
>> +     * @exec: the &drm_exec structure
>> +     */
>> +    struct drm_exec exec;
>> +
>> +    /**
>> +     * @vm: the &drm_gpuvm to lock its DMA reservations
>> +     */
>> +    struct drm_gpuvm *vm;
>> +
>> +    /**
>> +     * @extra: Callback and corresponding private data for the driver to
>> +     * lock arbitrary additional &drm_gem_objects.
>> +     */
>> +    struct {
>> +        /**
>> +         * @fn: The driver callback to lock additional &drm_gem_objects.
>> +         */
>> +        int (*fn)(struct drm_gpuvm_exec *vm_exec,
>> +              unsigned int num_fences);
>> +
>> +        /**
>> +         * @priv: driver private data for the @fn callback
>> +         */
>> +        void *priv;
>> +    } extra;
>> +};
>> +
>> +/**
>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
>> + * @gpuvm: the &drm_gpuvm
>> + * @exec: the &drm_exec context
>> + * @num_fences: the amount of &dma_fences to reserve
>> + *
>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
>> + *
>> + * Using this function directly, it is the drivers responsibility to call
>> + * drm_exec_init() and drm_exec_fini() accordingly.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +static inline int
>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>> +             struct drm_exec *exec,
>> +             unsigned int num_fences)
>> +{
>> +    return drm_exec_prepare_obj(exec, &gpuvm->d_obj, num_fences);
>> +}
>> +
>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>> +                  struct drm_exec *exec,
>> +                  unsigned int num_fences);
>> +
>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>> +                struct drm_exec *exec,
>> +                u64 addr, u64 range,
>> +                unsigned int num_fences);
>> +
>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>> +            unsigned int num_fences,
>> +            bool interruptible);
>> +
>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>> +                  struct drm_gem_object **objs,
>> +                  unsigned int num_objs,
>> +                  unsigned int num_fences,
>> +                  bool interruptible);
>> +
>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>> +                  u64 addr, u64 range,
>> +                  unsigned int num_fences,
>> +                  bool interruptible);
>> +
>> +/**
>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
>> + * @gpuvm: the &drm_gpuvm
>> + *
>> + * Releases all dma-resv locks of all &drm_gem_objects previously acquired
>> + * through drm_gpuvm_lock() or its variants.
>> + *
>> + * Returns: 0 on success, negative error code on failure.
>> + */
>> +static inline void
>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>> +{
>> +    drm_exec_fini(&vm_exec->exec);
>> +}
>> +
>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>> +                  struct drm_exec *exec,
>> +                  struct dma_fence *fence,
>> +                  enum dma_resv_usage private_usage,
>> +                  enum dma_resv_usage extobj_usage);
>> +
>> +/**
>> + * drm_gpuvm_exec_resv_add_fence()
>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>> + * @fence: fence to add
>> + * @private_usage: private dma-resv usage
>> + * @extobj_usage: extobj dma-resv usage
>> + *
>> + * See drm_gpuvm_resv_add_fence().
>> + */
>> +static inline void
>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
>> +                  struct dma_fence *fence,
>> +                  enum dma_resv_usage private_usage,
>> +                  enum dma_resv_usage extobj_usage)
>> +{
>> +    drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
>> +                 private_usage, extobj_usage);
>> +}
>> +
>>   /**
>>    * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
>>    * &drm_gem_object combination
>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>                * gpuva list.
>>                */
>>               struct list_head gem;
>> +
>> +            /**
>> +             * @evict: List entry to attach to the &drm_gpuvms
>> +             * extobj list.
>> +             */
>> +            struct list_head extobj;
>> +
>> +            /**
>> +             * @evict: List entry to attach to the &drm_gpuvms evict
>> +             * list.
>> +             */
>> +            struct list_head evict;
>>           } entry;
>>       } list;
>>   };
>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>   drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>             struct drm_gem_object *obj);
>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>> +
>>   /**
>>    * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
>>    * @va__: &drm_gpuva structure to assign to in each iteration step
>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>        * used.
>>        */
>>       int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
>> +
>> +    /**
>> +     * @bo_validate: called from drm_gpuvm_validate()
>> +     *
>> +     * Drivers receive this callback for every evicted &drm_gem_object being
>> +     * mapped in the corresponding &drm_gpuvm.
>> +     *
>> +     * Typically, drivers would call their driver specific variant of
>> +     * ttm_bo_validate() from within this callback.
>> +     */
>> +    int (*bo_validate)(struct drm_gem_object *obj);
> 
> Same here. Could we have a vm_bo as an argument instead, so that the callback knows what gpuvm we're targeting and can mark all its gpu_vas for revalidation? Or is that intended to be done elsewhere?

Makes sense as well. I'll change that too.

> 
>>   };
>>   int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
> 
> Thanks,
> 
> Thomas
> 
>
Thomas Hellstrom Sept. 14, 2023, 5:13 p.m. UTC | #39
On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote:
> On 9/14/23 13:32, Thomas Hellström wrote:
> > 
> > On 9/14/23 12:57, Danilo Krummrich wrote:
> > > On 9/13/23 14:16, Danilo Krummrich wrote:
> > > 
> > > <snip>
> > > 
> > > > > > And validate() can remove it while still holding all dma-
> > > > > > resv locks,
> > > > > > neat!
> > > > > > However, what if two tasks are trying to lock the VA space
> > > > > > concurrently? What
> > > > > > do we do when the drm_gpuvm_bo's refcount drops to zero in
> > > > > > drm_gpuva_unlink()?
> > > > > > Are we guaranteed that at this point of time the
> > > > > > drm_gpuvm_bo is not
> > > > > > on the
> > > > > > evicted list? Because otherwise we would call
> > > > > > drm_gpuvm_bo_destroy()
> > > > > > with the
> > > > > > dma-resv lock held, which wouldn't be allowed, since
> > > > > > drm_gpuvm_bo_destroy()
> > > > > > might drop the last reference to the drm_gem_object and
> > > > > > hence we'd
> > > > > > potentially
> > > > > > free the dma-resv lock while holding it, at least if it's
> > > > > > an external
> > > > > > object.
> > > > > 
> > > > > Easiest way in this scheme is to think of the lists as being
> > > > > protected
> > > > > by the vm's resv lock. That means anybody calling unlink()
> > > > > must also
> > > > > hold the vm's resv lock. (Which is OK from an UAF point of
> > > > > view, but
> > > > > perhaps not from a locking inversion POW from an async list
> > > > > update).
> > > > 
> > > > This would mean that on unlink() we'd need to hold the VM's
> > > > resv lock and the
> > > > corresponding GEM's resv lock (in case they're not the same
> > > > anyways) because the
> > > > VM's resv lock would protect the external / evicted object
> > > > lists and the GEM
> > > > objects resv lock protects the GEM's list of drm_gpuvm_bos and
> > > > the
> > > > drm_gpuvm_bo's list of drm_gpuvas.
> > > 
> > > As mentioned below the same applies for drm_gpuvm_bo_put() since
> > > it might
> > > destroy the vm_bo, which includes removing the vm_bo from
> > > external / evicted
> > > object lists and the GEMs list of vm_bos.
> > > 
> > > As mentioned, if the GEM's dma-resv is different from the VM's
> > > dma-resv we need
> > > to take both locks. Ultimately, this would mean we need a
> > > drm_exec loop, because
> > > we can't know the order in which to take these locks. Doing a
> > > full drm_exec loop
> > > just to put() a vm_bo doesn't sound reasonable to me.
> > > 
> > > Can we instead just have an internal mutex for locking the lists
> > > such that we
> > > avoid taking and dropping the spinlocks, which we use currently,
> > > in a loop?
> > 
> > You'd have the same locking inversion problem with a mutex, right?
> > Since in the eviction path you have resv->mutex, from exec you have
> > resv->mutex->resv because validate would attempt to grab resv.
> 
> Both lists, evict and extobj, would need to have a separate mutex,
> not a common one.
> We'd also need a dedicated GEM gpuva lock. Then the only rule would
> be that you can't
> hold the dma-resv lock when calling put(). Which I admit is not that
> nice.
> 
> With the current spinlock solution drivers wouldn't need to worry
> about anything locking
> related though. So maybe I come back to your proposal of having a
> switch for external
> locking with dma-resv locks entirely. Such that with external dma-
> resv locking I skip
> all the spinlocks and add lockdep checks instead.
> 
> I think that makes the most sense in terms of taking advantage of
> external dma-resv locking
> where possible and on the other hand having a self-contained solution
> if not. This should
> get all concerns out of the way, yours, Christian's and Boris'.

If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock
solution, and check back after a while to see if we can remove either
option once most pitfalls are hit.

Thanks,
/Thomas


> 
> > 
> > That said, xe currently indeed does the vm+bo exec dance on vma
> > put.
> > 
> > One reason why that seemingly horrible construct is good, is that
> > when evicting an extobj and you need to access individual vmas to
> > Zap page table entries or TLB flush, those VMAs are not allowed to
> > go away (we're not refcounting them). Holding the bo resv on gpuva
> > put prevents that from happening. Possibly one could use another
> > mutex to protect the gem->vm_bo list to achieve the same, but we'd
> > need to hold it on gpuva put.
> > 
> > /Thomas
> > 
> > 
> > > 
> > > - Danilo
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > > 
> > > > > > > > For extobjs an outer lock would be enough in case of
> > > > > > > > Xe, but I
> > > > > > > > really would not
> > > > > > > > like to add even more complexity just to get the
> > > > > > > > spinlock out of
> > > > > > > > the way in case
> > > > > > > > the driver already has an outer lock protecting this
> > > > > > > > path.
> > > > > > > 
> > > > > > > I must disagree here. These spinlocks and atomic
> > > > > > > operations are
> > > > > > > pretty
> > > > > > > costly and as discussed earlier this type of locking was
> > > > > > > the reason
> > > > > > > (at
> > > > > > > least according to the commit message) that made
> > > > > > > Christian drop the
> > > > > > > XArray
> > > > > > > use in drm_exec for the same set of objects: "The locking
> > > > > > > overhead
> > > > > > > is
> > > > > > > unecessary and measurable". IMHO the spinlock is the
> > > > > > > added
> > > > > > > complexity and a
> > > > > > > single wide lock following the drm locking guidelines set
> > > > > > > out by
> > > > > > > Daniel and
> > > > > > > David should really be the default choice with an opt-in
> > > > > > > for a
> > > > > > > spinlock if
> > > > > > > needed for async and pushing out to a wq is not an
> > > > > > > option.
> > > > > > 
> > > > > > For the external object list an outer lock would work as
> > > > > > long as it's
> > > > > > not the
> > > > > > dma-resv lock of the corresponding GEM object, since here
> > > > > > we actually
> > > > > > need to
> > > > > > remove the list entry from the external object list on
> > > > > > drm_gpuvm_bo_destroy().
> > > > > > It's just a bit weird design wise that drivers would need
> > > > > > to take
> > > > > > this outer
> > > > > > lock on:
> > > > > > 
> > > > > > - drm_gpuvm_bo_extobj_add()
> > > > > > - drm_gpuvm_bo_destroy()        (and hence also
> > > > > > drm_gpuvm_bo_put())
> > > > > > - drm_gpuva_unlink()            (because it needs to call
> > > > > > drm_gpuvm_bo_put())
> > > > > > - drm_gpuvm_exec_lock()
> > > > > > - drm_gpuvm_exec_lock_array()
> > > > > > - drm_gpuvm_prepare_range()
> > > > > > 
> > > > > > Given that it seems reasonable to do all the required
> > > > > > locking
> > > > > > internally.
> > > > > 
> > > > >  From a design POW, there has been a clear direction in XE to
> > > > > make
> > > > > things similar to mmap() / munmap(), so this outer lock,
> > > > > which in Xe is
> > > > > an rwsem, is used in a similar way as the mmap_lock. It's
> > > > > protecting
> > > > > the page-table structures and vma rb tree, the userptr
> > > > > structures and
> > > > > the extobj list. Basically it's taken early in the exec
> > > > > IOCTL, the
> > > > > VM_BIND ioctl, the compute rebind worker and the pagefault
> > > > > handler, so
> > > > > all of the above are just asserting that it is taken in the
> > > > > correct
> > > > > mode.
> > > > > 
> > > > > But strictly with this scheme one could also use the vm's
> > > > > dma_resv for
> > > > > the extobj list since with drm_exec, it's locked before
> > > > > traversing the
> > > > > list.
> > > > > 
> > > > > The whole point of this scheme is to rely on locks that you
> > > > > already are
> > > > > supposed to be holding for various reasons and is simple to
> > > > > comprehend.
> > > > 
> > > > I don't agree that we're supposed to hold the VM's resv lock
> > > > anyways for
> > > > functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but
> > > > I'm fine using it
> > > > for that purpose nevertheless.
> > > > 
> > > > > 
> > > > > > 
> > > > > > In order to at least place lockdep checks, the driver would
> > > > > > need to
> > > > > > supply the
> > > > > > corresponding lock's lockdep_map, because the GPUVM
> > > > > > otherwise doesn't
> > > > > > know about
> > > > > > the lock.
> > > > > 
> > > > > Yes, that sounds reasonable. One lockdep map per list.
> > > > 
> > > > I'd really like to avoid that, especially now that everything
> > > > got simpler. We
> > > > should define the actual locks to take instead.
> > > > 
> > > > > 
> > > > > > 
> > > > > > Out of curiosity, what is the overhead of a spin_lock()
> > > > > > that doesn't
> > > > > > need to
> > > > > > spin?
> > > > > 
> > > > > I guess it's hard to tell exactly, but it is much lower on
> > > > > modern x86
> > > > > than what it used to be. Not sure about ARM, which is the
> > > > > other
> > > > > architecture important to us. I figure if there is little
> > > > > cache-line
> > > > > bouncing the main overhead comes from the implied barriers.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > A pretty simple way that would not add much code would be
> > > > > > > 
> > > > > > > static void gpuvm_cond_spin_lock(const struct drm_gpuvm
> > > > > > > *gpuvm,
> > > > > > > spinlock_t
> > > > > > > *lock)
> > > > > > > 
> > > > > > > {
> > > > > > > 
> > > > > > >      if (!gpuvm->resv_protected_lists)
> > > > > > >          spin_lock(lock);
> > > > > > > 
> > > > > > > }
> > > > > > > 
> > > > > > > > > For such drivers, that would require anybody calling
> > > > > > > > > unlink to
> > > > > > > > > hold the vm's
> > > > > > > > > resv, though.
> > > > > > > > In V4 I want to go back to having a dedicated lock for
> > > > > > > > the GEMs
> > > > > > > > gpuva list (or
> > > > > > > > VM_BO list to be more precise). We can't just use the
> > > > > > > > dma-resv
> > > > > > > > lock for that
> > > > > > > > with VM_BO abstractions, because on destruction of a
> > > > > > > > VM_BO we
> > > > > > > > otherwise wouldn't
> > > > > > > > be allowed to already hold the dma-resv lock. That's
> > > > > > > > the fix I
> > > > > > > > was referring to
> > > > > > > > earlier.
> > > > > > > 
> > > > > > > Yeah, I can see the need for a dedicated lock for the
> > > > > > > GEM's gpuva
> > > > > > > list, but
> > > > > > > holding the vm's dma-resv lock across the unlink
> > > > > > > shouldn't be a
> > > > > > > problem. We
> > > > > > > may free the object and a pointer to the vm's resv during
> > > > > > > unlink
> > > > > > > but we
> > > > > > > don't free the vm's resv.  It'd be a matter of ensuring
> > > > > > > that any
> > > > > > > calls to
> > > > > > > unlink from *within* drm_gpuvm allows it to be held.
> > > > > > 
> > > > > > Drivers calling unlink() from the fence signaling path
> > > > > > can't use the
> > > > > > VM's
> > > > > > dma-resv lock.
> > > > > 
> > > > > Yes, that made me a bit curious because in the current
> > > > > version the code
> > > > > required the object's dma_resv for unlink() which can't be
> > > > > grabbed
> > > > > either from the fence signaling path. So are there any
> > > > > drivers actually
> > > > > wanting to do that? If so, they will either need to resort to
> > > > > the
> > > > > current spinlock solution or they will need to call unlink
> > > > > from a
> > > > > workqueue item.
> > > > 
> > > > As Boris already mentioned we have the dma-resv lock by default
> > > > or a driver
> > > > specific GEM gpuva lock as opt-in. Now, we can get rid of the
> > > > latter.
> > > > 
> > > > > > 
> > > > > > Also, what if the object is an external object? We can't
> > > > > > use the VM's
> > > > > > dma-resv
> > > > > > lock here.
> > > > > 
> > > > > Why? Typically (sync) unlink is only ever called from an
> > > > > unbind-like
> > > > > operation where it should be trivial to grab the vm's resv.
> > > > > Or, for
> > > > > that matter any outer lock protecting the extobj list. Rule
> > > > > would be
> > > > > the drm_gpuvm_bo::entry::extobj  and
> > > > > drm_gpuvm_bo::entry::evict would
> > > > > be protected by either the vm's dma_resv (or possibly an
> > > > > outer lock in
> > > > > the case of the extobj list).
> > > > 
> > > > Outer lock wouldn't have been working for updates in the async
> > > > path, but
> > > > shouldn't be relevant anymore. We could use the VM's resv for
> > > > that.
> > > > 
> > > > > 
> > > > > >   And we can't have the GEM objs dma-resv lock held when
> > > > > > calling
> > > > > > unlink(), since unlink() calls drm_gpuvm_bo_put(), which if
> > > > > > the
> > > > > > refcount drops
> > > > > > to zero calls drm_gpuvm_bo_destroy() and
> > > > > > drm_gpuvm_bo_destroy() might
> > > > > > drop the
> > > > > > last reference of the GEM object.
> > > > > 
> > > > > Yes, but this is a different problem as to what exactly
> > > > > protects
> > > > > drm_gpuvm_bo::entry::gem. Either as you suggest an internal
> > > > > per bo list
> > > > > lock, or if we want to keep the bo's dma_resv we need to
> > > > > ensure that
> > > > > the caller of dma_resv_unlock(obj->resv) actually refcounts
> > > > > its obj
> > > > > pointer, and doesn't implicitly rely on the gpuvm_bo's
> > > > > refcount (I know
> > > > > Boris didn't like that, but requiring an explicit refcount
> > > > > for a
> > > > > pointer you dereference unless you're under a lock that
> > > > > ensures keeping
> > > > > the object alive is pretty much required?) But anyway for the
> > > > > drm_gpuvm_bo::entry::gem list protection (bo resv or internal
> > > > > spinlock)
> > > > > I don't have a strong preference.
> > > > 
> > > > We can keep the GEM objects dma-resv lock, however as mentioned
> > > > above
> > > > drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both
> > > > the VM's resv lock
> > > > and the GEM's resv lock in case they differ.
> > > > 
> > > 
> > > > > > > 
> > > 
> > 
>
Danilo Krummrich Sept. 14, 2023, 5:15 p.m. UTC | #40
On 9/14/23 19:13, Thomas Hellström wrote:
> On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote:
>> On 9/14/23 13:32, Thomas Hellström wrote:
>>>
>>> On 9/14/23 12:57, Danilo Krummrich wrote:
>>>> On 9/13/23 14:16, Danilo Krummrich wrote:
>>>>
>>>> <snip>
>>>>
>>>>>>> And validate() can remove it while still holding all dma-
>>>>>>> resv locks,
>>>>>>> neat!
>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>> concurrently? What
>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>> drm_gpuva_unlink()?
>>>>>>> Are we guaranteed that at this point of time the
>>>>>>> drm_gpuvm_bo is not
>>>>>>> on the
>>>>>>> evicted list? Because otherwise we would call
>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>> with the
>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>> might drop the last reference to the drm_gem_object and
>>>>>>> hence we'd
>>>>>>> potentially
>>>>>>> free the dma-resv lock while holding it, at least if it's
>>>>>>> an external
>>>>>>> object.
>>>>>>
>>>>>> Easiest way in this scheme is to think of the lists as being
>>>>>> protected
>>>>>> by the vm's resv lock. That means anybody calling unlink()
>>>>>> must also
>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of
>>>>>> view, but
>>>>>> perhaps not from a locking inversion POW from an async list
>>>>>> update).
>>>>>
>>>>> This would mean that on unlink() we'd need to hold the VM's
>>>>> resv lock and the
>>>>> corresponding GEM's resv lock (in case they're not the same
>>>>> anyways) because the
>>>>> VM's resv lock would protect the external / evicted object
>>>>> lists and the GEM
>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and
>>>>> the
>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>
>>>> As mentioned below the same applies for drm_gpuvm_bo_put() since
>>>> it might
>>>> destroy the vm_bo, which includes removing the vm_bo from
>>>> external / evicted
>>>> object lists and the GEMs list of vm_bos.
>>>>
>>>> As mentioned, if the GEM's dma-resv is different from the VM's
>>>> dma-resv we need
>>>> to take both locks. Ultimately, this would mean we need a
>>>> drm_exec loop, because
>>>> we can't know the order in which to take these locks. Doing a
>>>> full drm_exec loop
>>>> just to put() a vm_bo doesn't sound reasonable to me.
>>>>
>>>> Can we instead just have an internal mutex for locking the lists
>>>> such that we
>>>> avoid taking and dropping the spinlocks, which we use currently,
>>>> in a loop?
>>>
>>> You'd have the same locking inversion problem with a mutex, right?
>>> Since in the eviction path you have resv->mutex, from exec you have
>>> resv->mutex->resv because validate would attempt to grab resv.
>>
>> Both lists, evict and extobj, would need to have a separate mutex,
>> not a common one.
>> We'd also need a dedicated GEM gpuva lock. Then the only rule would
>> be that you can't
>> hold the dma-resv lock when calling put(). Which I admit is not that
>> nice.
>>
>> With the current spinlock solution drivers wouldn't need to worry
>> about anything locking
>> related though. So maybe I come back to your proposal of having a
>> switch for external
>> locking with dma-resv locks entirely. Such that with external dma-
>> resv locking I skip
>> all the spinlocks and add lockdep checks instead.
>>
>> I think that makes the most sense in terms of taking advantage of
>> external dma-resv locking
>> where possible and on the other hand having a self-contained solution
>> if not. This should
>> get all concerns out of the way, yours, Christian's and Boris'.
> 
> If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock
> solution, and check back after a while to see if we can remove either
> option once most pitfalls are hit.

Sounds good, I'll prepare this for a V4.

- Danilo

> 
> Thanks,
> /Thomas
> 
> 
>>
>>>
>>> That said, xe currently indeed does the vm+bo exec dance on vma
>>> put.
>>>
>>> One reason why that seemingly horrible construct is good, is that
>>> when evicting an extobj and you need to access individual vmas to
>>> Zap page table entries or TLB flush, those VMAs are not allowed to
>>> go away (we're not refcounting them). Holding the bo resv on gpuva
>>> put prevents that from happening. Possibly one could use another
>>> mutex to protect the gem->vm_bo list to achieve the same, but we'd
>>> need to hold it on gpuva put.
>>>
>>> /Thomas
>>>
>>>
>>>>
>>>> - Danilo
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> For extobjs an outer lock would be enough in case of
>>>>>>>>> Xe, but I
>>>>>>>>> really would not
>>>>>>>>> like to add even more complexity just to get the
>>>>>>>>> spinlock out of
>>>>>>>>> the way in case
>>>>>>>>> the driver already has an outer lock protecting this
>>>>>>>>> path.
>>>>>>>>
>>>>>>>> I must disagree here. These spinlocks and atomic
>>>>>>>> operations are
>>>>>>>> pretty
>>>>>>>> costly and as discussed earlier this type of locking was
>>>>>>>> the reason
>>>>>>>> (at
>>>>>>>> least according to the commit message) that made
>>>>>>>> Christian drop the
>>>>>>>> XArray
>>>>>>>> use in drm_exec for the same set of objects: "The locking
>>>>>>>> overhead
>>>>>>>> is
>>>>>>>> unecessary and measurable". IMHO the spinlock is the
>>>>>>>> added
>>>>>>>> complexity and a
>>>>>>>> single wide lock following the drm locking guidelines set
>>>>>>>> out by
>>>>>>>> Daniel and
>>>>>>>> David should really be the default choice with an opt-in
>>>>>>>> for a
>>>>>>>> spinlock if
>>>>>>>> needed for async and pushing out to a wq is not an
>>>>>>>> option.
>>>>>>>
>>>>>>> For the external object list an outer lock would work as
>>>>>>> long as it's
>>>>>>> not the
>>>>>>> dma-resv lock of the corresponding GEM object, since here
>>>>>>> we actually
>>>>>>> need to
>>>>>>> remove the list entry from the external object list on
>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>> It's just a bit weird design wise that drivers would need
>>>>>>> to take
>>>>>>> this outer
>>>>>>> lock on:
>>>>>>>
>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also
>>>>>>> drm_gpuvm_bo_put())
>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>> drm_gpuvm_bo_put())
>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>
>>>>>>> Given that it seems reasonable to do all the required
>>>>>>> locking
>>>>>>> internally.
>>>>>>
>>>>>>   From a design POW, there has been a clear direction in XE to
>>>>>> make
>>>>>> things similar to mmap() / munmap(), so this outer lock,
>>>>>> which in Xe is
>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's
>>>>>> protecting
>>>>>> the page-table structures and vma rb tree, the userptr
>>>>>> structures and
>>>>>> the extobj list. Basically it's taken early in the exec
>>>>>> IOCTL, the
>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault
>>>>>> handler, so
>>>>>> all of the above are just asserting that it is taken in the
>>>>>> correct
>>>>>> mode.
>>>>>>
>>>>>> But strictly with this scheme one could also use the vm's
>>>>>> dma_resv for
>>>>>> the extobj list since with drm_exec, it's locked before
>>>>>> traversing the
>>>>>> list.
>>>>>>
>>>>>> The whole point of this scheme is to rely on locks that you
>>>>>> already are
>>>>>> supposed to be holding for various reasons and is simple to
>>>>>> comprehend.
>>>>>
>>>>> I don't agree that we're supposed to hold the VM's resv lock
>>>>> anyways for
>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but
>>>>> I'm fine using it
>>>>> for that purpose nevertheless.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> In order to at least place lockdep checks, the driver would
>>>>>>> need to
>>>>>>> supply the
>>>>>>> corresponding lock's lockdep_map, because the GPUVM
>>>>>>> otherwise doesn't
>>>>>>> know about
>>>>>>> the lock.
>>>>>>
>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>
>>>>> I'd really like to avoid that, especially now that everything
>>>>> got simpler. We
>>>>> should define the actual locks to take instead.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Out of curiosity, what is the overhead of a spin_lock()
>>>>>>> that doesn't
>>>>>>> need to
>>>>>>> spin?
>>>>>>
>>>>>> I guess it's hard to tell exactly, but it is much lower on
>>>>>> modern x86
>>>>>> than what it used to be. Not sure about ARM, which is the
>>>>>> other
>>>>>> architecture important to us. I figure if there is little
>>>>>> cache-line
>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>
>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm
>>>>>>>> *gpuvm,
>>>>>>>> spinlock_t
>>>>>>>> *lock)
>>>>>>>>
>>>>>>>> {
>>>>>>>>
>>>>>>>>       if (!gpuvm->resv_protected_lists)
>>>>>>>>           spin_lock(lock);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>>> For such drivers, that would require anybody calling
>>>>>>>>>> unlink to
>>>>>>>>>> hold the vm's
>>>>>>>>>> resv, though.
>>>>>>>>> In V4 I want to go back to having a dedicated lock for
>>>>>>>>> the GEMs
>>>>>>>>> gpuva list (or
>>>>>>>>> VM_BO list to be more precise). We can't just use the
>>>>>>>>> dma-resv
>>>>>>>>> lock for that
>>>>>>>>> with VM_BO abstractions, because on destruction of a
>>>>>>>>> VM_BO we
>>>>>>>>> otherwise wouldn't
>>>>>>>>> be allowed to already hold the dma-resv lock. That's
>>>>>>>>> the fix I
>>>>>>>>> was referring to
>>>>>>>>> earlier.
>>>>>>>>
>>>>>>>> Yeah, I can see the need for a dedicated lock for the
>>>>>>>> GEM's gpuva
>>>>>>>> list, but
>>>>>>>> holding the vm's dma-resv lock across the unlink
>>>>>>>> shouldn't be a
>>>>>>>> problem. We
>>>>>>>> may free the object and a pointer to the vm's resv during
>>>>>>>> unlink
>>>>>>>> but we
>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring
>>>>>>>> that any
>>>>>>>> calls to
>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>
>>>>>>> Drivers calling unlink() from the fence signaling path
>>>>>>> can't use the
>>>>>>> VM's
>>>>>>> dma-resv lock.
>>>>>>
>>>>>> Yes, that made me a bit curious because in the current
>>>>>> version the code
>>>>>> required the object's dma_resv for unlink() which can't be
>>>>>> grabbed
>>>>>> either from the fence signaling path. So are there any
>>>>>> drivers actually
>>>>>> wanting to do that? If so, they will either need to resort to
>>>>>> the
>>>>>> current spinlock solution or they will need to call unlink
>>>>>> from a
>>>>>> workqueue item.
>>>>>
>>>>> As Boris already mentioned we have the dma-resv lock by default
>>>>> or a driver
>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the
>>>>> latter.
>>>>>
>>>>>>>
>>>>>>> Also, what if the object is an external object? We can't
>>>>>>> use the VM's
>>>>>>> dma-resv
>>>>>>> lock here.
>>>>>>
>>>>>> Why? Typically (sync) unlink is only ever called from an
>>>>>> unbind-like
>>>>>> operation where it should be trivial to grab the vm's resv.
>>>>>> Or, for
>>>>>> that matter any outer lock protecting the extobj list. Rule
>>>>>> would be
>>>>>> the drm_gpuvm_bo::entry::extobj  and
>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>> be protected by either the vm's dma_resv (or possibly an
>>>>>> outer lock in
>>>>>> the case of the extobj list).
>>>>>
>>>>> Outer lock wouldn't have been working for updates in the async
>>>>> path, but
>>>>> shouldn't be relevant anymore. We could use the VM's resv for
>>>>> that.
>>>>>
>>>>>>
>>>>>>>    And we can't have the GEM objs dma-resv lock held when
>>>>>>> calling
>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if
>>>>>>> the
>>>>>>> refcount drops
>>>>>>> to zero calls drm_gpuvm_bo_destroy() and
>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>> drop the
>>>>>>> last reference of the GEM object.
>>>>>>
>>>>>> Yes, but this is a different problem as to what exactly
>>>>>> protects
>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal
>>>>>> per bo list
>>>>>> lock, or if we want to keep the bo's dma_resv we need to
>>>>>> ensure that
>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts
>>>>>> its obj
>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's
>>>>>> refcount (I know
>>>>>> Boris didn't like that, but requiring an explicit refcount
>>>>>> for a
>>>>>> pointer you dereference unless you're under a lock that
>>>>>> ensures keeping
>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal
>>>>>> spinlock)
>>>>>> I don't have a strong preference.
>>>>>
>>>>> We can keep the GEM objects dma-resv lock, however as mentioned
>>>>> above
>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both
>>>>> the VM's resv lock
>>>>> and the GEM's resv lock in case they differ.
>>>>>
>>>>
>>>>>>>>
>>>>
>>>
>>
>
Thomas Hellstrom Sept. 14, 2023, 5:21 p.m. UTC | #41
On Thu, 2023-09-14 at 18:36 +0200, Danilo Krummrich wrote:
> On 9/14/23 15:48, Thomas Hellström wrote:
> > Hi, Danilo
> > 
> > Some additional minor comments as xe conversion progresses.
> > 
> > On 9/9/23 17:31, Danilo Krummrich wrote:
> > > So far the DRM GPUVA manager offers common infrastructure to
> > > track GPU VA
> > > allocations and mappings, generically connect GPU VA mappings to
> > > their
> > > backing buffers and perform more complex mapping operations on
> > > the GPU VA
> > > space.
> > > 
> > > However, there are more design patterns commonly used by drivers,
> > > which
> > > can potentially be generalized in order to make the DRM GPUVA
> > > manager
> > > represent a basic GPU-VM implementation. In this context, this
> > > patch aims
> > > at generalizing the following elements.
> > > 
> > > 1) Provide a common dma-resv for GEM objects not being used
> > > outside of
> > >     this GPU-VM.
> > > 
> > > 2) Provide tracking of external GEM objects (GEM objects which
> > > are
> > >     shared with other GPU-VMs).
> > > 
> > > 3) Provide functions to efficiently lock all GEM objects dma-resv
> > > the
> > >     GPU-VM contains mappings of.
> > > 
> > > 4) Provide tracking of evicted GEM objects the GPU-VM contains
> > > mappings
> > >     of, such that validation of evicted GEM objects is
> > > accelerated.
> > > 
> > > 5) Provide some convinience functions for common patterns.
> > > 
> > > Rather than being designed as a "framework", the target is to
> > > make all
> > > features appear as a collection of optional helper functions,
> > > such that
> > > drivers are free to make use of the DRM GPUVA managers basic
> > > functionality and opt-in for other features without setting any
> > > feature
> > > flags, just by making use of the corresponding functions.
> > > 
> > > Big kudos to Boris Brezillon for his help to figure out locking
> > > for drivers
> > > updating the GPU VA space within the fence signalling path.
> > > 
> > > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > ---
> > > 
> > > +/**
> > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to /
> > > from a
> > > + * &drm_gpuvms evicted list
> > > + * @obj: the &drm_gem_object to add or remove
> > > + * @evict: indicates whether the object is evicted
> > > + *
> > > + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms
> > > evicted
> > > + * list containing a mapping of this &drm_gem_object.
> > > + */
> > > +void
> > > +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
> > > +{
> > > +    struct drm_gpuvm_bo *vm_bo;
> > > +
> > > +    drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> > > +        if (evict)
> > > +            drm_gpuvm_bo_list_add(vm_bo, evict);
> > > +        else
> > > +            drm_gpuvm_bo_list_del(vm_bo, evict);
> > > +    }
> > > +}
> > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
> > > +
> > 
> > We need a drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, ...) that
> > puts a single gpuvm_bo on the list, the above function could
> > perhaps be renamed as drm_gpuvm_gem_obj_evict(obj, ....).
> 
> Makes sense - gonna change that.
> 
> > 
> > Reason is some vm's are faulting vms which don't have an evict
> > list, but validate from the pagefault handler. Also evict == false
> > is dangerous because if called from within an exec, it might remove
> > the obj from other vm's evict list before they've had a chance to
> > rebind their VMAs.
> > 
> > >   static int
> > >   __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
> > >              struct drm_gpuva *va)
> > > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> > > index afa50b9059a2..834bb6d6617e 100644
> > > --- a/include/drm/drm_gpuvm.h
> > > +++ b/include/drm/drm_gpuvm.h
> > > @@ -26,10 +26,12 @@
> > >    */
> > >   #include <linux/list.h>
> > > +#include <linux/dma-resv.h>
> > >   #include <linux/rbtree.h>
> > >   #include <linux/types.h>
> > >   #include <drm/drm_gem.h>
> > > +#include <drm/drm_exec.h>
> > >   struct drm_gpuvm;
> > >   struct drm_gpuvm_bo;
> > > @@ -259,6 +261,38 @@ struct drm_gpuvm {
> > >        * space
> > >        */
> > >       struct dma_resv *resv;
> > > +
> > > +    /**
> > > +     * @extobj: structure holding the extobj list
> > > +     */
> > > +    struct {
> > > +        /**
> > > +         * @list: &list_head storing &drm_gpuvm_bos serving as
> > > +         * external object
> > > +         */
> > > +        struct list_head list;
> > > +
> > > +        /**
> > > +         * @lock: spinlock to protect the extobj list
> > > +         */
> > > +        spinlock_t lock;
> > > +    } extobj;
> > > +
> > > +    /**
> > > +     * @evict: structure holding the evict list and evict list
> > > lock
> > > +     */
> > > +    struct {
> > > +        /**
> > > +         * @list: &list_head storing &drm_gpuvm_bos currently
> > > being
> > > +         * evicted
> > > +         */
> > > +        struct list_head list;
> > > +
> > > +        /**
> > > +         * @lock: spinlock to protect the evict list
> > > +         */
> > > +        spinlock_t lock;
> > > +    } evict;
> > >   };
> > >   void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device
> > > *drm,
> > > @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm,
> > > struct drm_device *drm,
> > >               const struct drm_gpuvm_ops *ops);
> > >   void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
> > > +/**
> > > + * drm_gpuvm_is_extobj() - indicates whether the given
> > > &drm_gem_object is an
> > > + * external object
> > > + * @gpuvm: the &drm_gpuvm to check
> > > + * @obj: the &drm_gem_object to check
> > > + *
> > > + * Returns: true if the &drm_gem_object &dma_resv differs from
> > > the
> > > + * &drm_gpuvms &dma_resv, false otherwise
> > > + */
> > > +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
> > > +                       struct drm_gem_object *obj)
> > > +{
> > > +    return obj && obj->resv != gpuvm->resv;
> > > +}
> > > +
> > >   static inline struct drm_gpuva *
> > >   __drm_gpuva_next(struct drm_gpuva *va)
> > >   {
> > > @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
> > >   #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
> > >       list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list,
> > > rb.entry)
> > > +/**
> > > + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
> > > + *
> > > + * This structure should be created on the stack as &drm_exec
> > > should be.
> > > + *
> > > + * Optionally, @extra can be set in order to lock additional
> > > &drm_gem_objects.
> > > + */
> > > +struct drm_gpuvm_exec {
> > > +    /**
> > > +     * @exec: the &drm_exec structure
> > > +     */
> > > +    struct drm_exec exec;
> > > +
> > > +    /**
> > > +     * @vm: the &drm_gpuvm to lock its DMA reservations
> > > +     */
> > > +    struct drm_gpuvm *vm;
> > > +
> > > +    /**
> > > +     * @extra: Callback and corresponding private data for the
> > > driver to
> > > +     * lock arbitrary additional &drm_gem_objects.
> > > +     */
> > > +    struct {
> > > +        /**
> > > +         * @fn: The driver callback to lock additional
> > > &drm_gem_objects.
> > > +         */
> > > +        int (*fn)(struct drm_gpuvm_exec *vm_exec,
> > > +              unsigned int num_fences);
> > > +
> > > +        /**
> > > +         * @priv: driver private data for the @fn callback
> > > +         */
> > > +        void *priv;
> > > +    } extra;
> > > +};
> > > +
> > > +/**
> > > + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
> > > + * @gpuvm: the &drm_gpuvm
> > > + * @exec: the &drm_exec context
> > > + * @num_fences: the amount of &dma_fences to reserve
> > > + *
> > > + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
> > > &drm_gem_object.
> > > + *
> > > + * Using this function directly, it is the drivers
> > > responsibility to call
> > > + * drm_exec_init() and drm_exec_fini() accordingly.
> > > + *
> > > + * Returns: 0 on success, negative error code on failure.
> > > + */
> > > +static inline int
> > > +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
> > > +             struct drm_exec *exec,
> > > +             unsigned int num_fences)
> > > +{
> > > +    return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
> > > num_fences);
> > > +}
> > > +
> > > +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > > +                  struct drm_exec *exec,
> > > +                  unsigned int num_fences);
> > > +
> > > +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
> > > +                struct drm_exec *exec,
> > > +                u64 addr, u64 range,
> > > +                unsigned int num_fences);
> > > +
> > > +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > > +            unsigned int num_fences,
> > > +            bool interruptible);
> > > +
> > > +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
> > > +                  struct drm_gem_object **objs,
> > > +                  unsigned int num_objs,
> > > +                  unsigned int num_fences,
> > > +                  bool interruptible);
> > > +
> > > +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
> > > +                  u64 addr, u64 range,
> > > +                  unsigned int num_fences,
> > > +                  bool interruptible);
> > > +
> > > +/**
> > > + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
> > > + * @gpuvm: the &drm_gpuvm
> > > + *
> > > + * Releases all dma-resv locks of all &drm_gem_objects
> > > previously acquired
> > > + * through drm_gpuvm_lock() or its variants.
> > > + *
> > > + * Returns: 0 on success, negative error code on failure.
> > > + */
> > > +static inline void
> > > +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> > > +{
> > > +    drm_exec_fini(&vm_exec->exec);
> > > +}
> > > +
> > > +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
> > > +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > > +                  struct drm_exec *exec,
> > > +                  struct dma_fence *fence,
> > > +                  enum dma_resv_usage private_usage,
> > > +                  enum dma_resv_usage extobj_usage);
> > > +
> > > +/**
> > > + * drm_gpuvm_exec_resv_add_fence()
> > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > + * @fence: fence to add
> > > + * @private_usage: private dma-resv usage
> > > + * @extobj_usage: extobj dma-resv usage
> > > + *
> > > + * See drm_gpuvm_resv_add_fence().
> > > + */
> > > +static inline void
> > > +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
> > > +                  struct dma_fence *fence,
> > > +                  enum dma_resv_usage private_usage,
> > > +                  enum dma_resv_usage extobj_usage)
> > > +{
> > > +    drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
> > > +                 private_usage, extobj_usage);
> > > +}
> > > +
> > >   /**
> > >    * struct drm_gpuvm_bo - structure representing a &drm_gpuvm
> > > and
> > >    * &drm_gem_object combination
> > > @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
> > >                * gpuva list.
> > >                */
> > >               struct list_head gem;
> > > +
> > > +            /**
> > > +             * @evict: List entry to attach to the &drm_gpuvms
> > > +             * extobj list.
> > > +             */
> > > +            struct list_head extobj;
> > > +
> > > +            /**
> > > +             * @evict: List entry to attach to the &drm_gpuvms
> > > evict
> > > +             * list.
> > > +             */
> > > +            struct list_head evict;
> > >           } entry;
> > >       } list;
> > >   };
> > > @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
> > >   drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> > >             struct drm_gem_object *obj);
> > > +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
> > > +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
> > > +
> > >   /**
> > >    * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of
> > > &drm_gpuva
> > >    * @va__: &drm_gpuva structure to assign to in each iteration
> > > step
> > > @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
> > >        * used.
> > >        */
> > >       int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
> > > +
> > > +    /**
> > > +     * @bo_validate: called from drm_gpuvm_validate()
> > > +     *
> > > +     * Drivers receive this callback for every evicted
> > > &drm_gem_object being
> > > +     * mapped in the corresponding &drm_gpuvm.
> > > +     *
> > > +     * Typically, drivers would call their driver specific
> > > variant of
> > > +     * ttm_bo_validate() from within this callback.
> > > +     */
> > > +    int (*bo_validate)(struct drm_gem_object *obj);
> > 
> > Same here. Could we have a vm_bo as an argument instead, so that
> > the callback knows what gpuvm we're targeting and can mark all its
> > gpu_vas for revalidation? Or is that intended to be done elsewhere?
> 
> Makes sense as well. I'll change that too.

I forgot, drm_gpuvm_validate() would preferably take an drm_gpuvm_exec
argument because we need it in the validate callback. It's also easy
for the driver to subclass further if needed, to pass even more
arguments to its validate callback.

/Thomas


> 
> > 
> > >   };
> > >   int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
> > 
> > Thanks,
> > 
> > Thomas
> > 
> > 
>
Danilo Krummrich Sept. 14, 2023, 5:25 p.m. UTC | #42
On 9/14/23 19:21, Thomas Hellström wrote:
> On Thu, 2023-09-14 at 18:36 +0200, Danilo Krummrich wrote:
>> On 9/14/23 15:48, Thomas Hellström wrote:
>>> Hi, Danilo
>>>
>>> Some additional minor comments as xe conversion progresses.
>>>
>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>> track GPU VA
>>>> allocations and mappings, generically connect GPU VA mappings to
>>>> their
>>>> backing buffers and perform more complex mapping operations on
>>>> the GPU VA
>>>> space.
>>>>
>>>> However, there are more design patterns commonly used by drivers,
>>>> which
>>>> can potentially be generalized in order to make the DRM GPUVA
>>>> manager
>>>> represent a basic GPU-VM implementation. In this context, this
>>>> patch aims
>>>> at generalizing the following elements.
>>>>
>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>> outside of
>>>>      this GPU-VM.
>>>>
>>>> 2) Provide tracking of external GEM objects (GEM objects which
>>>> are
>>>>      shared with other GPU-VMs).
>>>>
>>>> 3) Provide functions to efficiently lock all GEM objects dma-resv
>>>> the
>>>>      GPU-VM contains mappings of.
>>>>
>>>> 4) Provide tracking of evicted GEM objects the GPU-VM contains
>>>> mappings
>>>>      of, such that validation of evicted GEM objects is
>>>> accelerated.
>>>>
>>>> 5) Provide some convinience functions for common patterns.
>>>>
>>>> Rather than being designed as a "framework", the target is to
>>>> make all
>>>> features appear as a collection of optional helper functions,
>>>> such that
>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>> functionality and opt-in for other features without setting any
>>>> feature
>>>> flags, just by making use of the corresponding functions.
>>>>
>>>> Big kudos to Boris Brezillon for his help to figure out locking
>>>> for drivers
>>>> updating the GPU VA space within the fence signalling path.
>>>>
>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>> ---
>>>>
>>>> +/**
>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to /
>>>> from a
>>>> + * &drm_gpuvms evicted list
>>>> + * @obj: the &drm_gem_object to add or remove
>>>> + * @evict: indicates whether the object is evicted
>>>> + *
>>>> + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms
>>>> evicted
>>>> + * list containing a mapping of this &drm_gem_object.
>>>> + */
>>>> +void
>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>> +{
>>>> +    struct drm_gpuvm_bo *vm_bo;
>>>> +
>>>> +    drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>> +        if (evict)
>>>> +            drm_gpuvm_bo_list_add(vm_bo, evict);
>>>> +        else
>>>> +            drm_gpuvm_bo_list_del(vm_bo, evict);
>>>> +    }
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>> +
>>>
>>> We need a drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, ...) that
>>> puts a single gpuvm_bo on the list, the above function could
>>> perhaps be renamed as drm_gpuvm_gem_obj_evict(obj, ....).
>>
>> Makes sense - gonna change that.
>>
>>>
>>> Reason is some vm's are faulting vms which don't have an evict
>>> list, but validate from the pagefault handler. Also evict == false
>>> is dangerous because if called from within an exec, it might remove
>>> the obj from other vm's evict list before they've had a chance to
>>> rebind their VMAs.
>>>
>>>>    static int
>>>>    __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>               struct drm_gpuva *va)
>>>> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
>>>> index afa50b9059a2..834bb6d6617e 100644
>>>> --- a/include/drm/drm_gpuvm.h
>>>> +++ b/include/drm/drm_gpuvm.h
>>>> @@ -26,10 +26,12 @@
>>>>     */
>>>>    #include <linux/list.h>
>>>> +#include <linux/dma-resv.h>
>>>>    #include <linux/rbtree.h>
>>>>    #include <linux/types.h>
>>>>    #include <drm/drm_gem.h>
>>>> +#include <drm/drm_exec.h>
>>>>    struct drm_gpuvm;
>>>>    struct drm_gpuvm_bo;
>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>         * space
>>>>         */
>>>>        struct dma_resv *resv;
>>>> +
>>>> +    /**
>>>> +     * @extobj: structure holding the extobj list
>>>> +     */
>>>> +    struct {
>>>> +        /**
>>>> +         * @list: &list_head storing &drm_gpuvm_bos serving as
>>>> +         * external object
>>>> +         */
>>>> +        struct list_head list;
>>>> +
>>>> +        /**
>>>> +         * @lock: spinlock to protect the extobj list
>>>> +         */
>>>> +        spinlock_t lock;
>>>> +    } extobj;
>>>> +
>>>> +    /**
>>>> +     * @evict: structure holding the evict list and evict list
>>>> lock
>>>> +     */
>>>> +    struct {
>>>> +        /**
>>>> +         * @list: &list_head storing &drm_gpuvm_bos currently
>>>> being
>>>> +         * evicted
>>>> +         */
>>>> +        struct list_head list;
>>>> +
>>>> +        /**
>>>> +         * @lock: spinlock to protect the evict list
>>>> +         */
>>>> +        spinlock_t lock;
>>>> +    } evict;
>>>>    };
>>>>    void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device
>>>> *drm,
>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>> struct drm_device *drm,
>>>>                const struct drm_gpuvm_ops *ops);
>>>>    void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>> +/**
>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>> &drm_gem_object is an
>>>> + * external object
>>>> + * @gpuvm: the &drm_gpuvm to check
>>>> + * @obj: the &drm_gem_object to check
>>>> + *
>>>> + * Returns: true if the &drm_gem_object &dma_resv differs from
>>>> the
>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>> + */
>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
>>>> +                       struct drm_gem_object *obj)
>>>> +{
>>>> +    return obj && obj->resv != gpuvm->resv;
>>>> +}
>>>> +
>>>>    static inline struct drm_gpuva *
>>>>    __drm_gpuva_next(struct drm_gpuva *va)
>>>>    {
>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>    #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
>>>>        list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list,
>>>> rb.entry)
>>>> +/**
>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
>>>> + *
>>>> + * This structure should be created on the stack as &drm_exec
>>>> should be.
>>>> + *
>>>> + * Optionally, @extra can be set in order to lock additional
>>>> &drm_gem_objects.
>>>> + */
>>>> +struct drm_gpuvm_exec {
>>>> +    /**
>>>> +     * @exec: the &drm_exec structure
>>>> +     */
>>>> +    struct drm_exec exec;
>>>> +
>>>> +    /**
>>>> +     * @vm: the &drm_gpuvm to lock its DMA reservations
>>>> +     */
>>>> +    struct drm_gpuvm *vm;
>>>> +
>>>> +    /**
>>>> +     * @extra: Callback and corresponding private data for the
>>>> driver to
>>>> +     * lock arbitrary additional &drm_gem_objects.
>>>> +     */
>>>> +    struct {
>>>> +        /**
>>>> +         * @fn: The driver callback to lock additional
>>>> &drm_gem_objects.
>>>> +         */
>>>> +        int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>> +              unsigned int num_fences);
>>>> +
>>>> +        /**
>>>> +         * @priv: driver private data for the @fn callback
>>>> +         */
>>>> +        void *priv;
>>>> +    } extra;
>>>> +};
>>>> +
>>>> +/**
>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
>>>> + * @gpuvm: the &drm_gpuvm
>>>> + * @exec: the &drm_exec context
>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>> + *
>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>> &drm_gem_object.
>>>> + *
>>>> + * Using this function directly, it is the drivers
>>>> responsibility to call
>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>> + *
>>>> + * Returns: 0 on success, negative error code on failure.
>>>> + */
>>>> +static inline int
>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>> +             struct drm_exec *exec,
>>>> +             unsigned int num_fences)
>>>> +{
>>>> +    return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>> num_fences);
>>>> +}
>>>> +
>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>> +                  struct drm_exec *exec,
>>>> +                  unsigned int num_fences);
>>>> +
>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>> +                struct drm_exec *exec,
>>>> +                u64 addr, u64 range,
>>>> +                unsigned int num_fences);
>>>> +
>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>> +            unsigned int num_fences,
>>>> +            bool interruptible);
>>>> +
>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>> +                  struct drm_gem_object **objs,
>>>> +                  unsigned int num_objs,
>>>> +                  unsigned int num_fences,
>>>> +                  bool interruptible);
>>>> +
>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>> +                  u64 addr, u64 range,
>>>> +                  unsigned int num_fences,
>>>> +                  bool interruptible);
>>>> +
>>>> +/**
>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
>>>> + * @gpuvm: the &drm_gpuvm
>>>> + *
>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>> previously acquired
>>>> + * through drm_gpuvm_lock() or its variants.
>>>> + *
>>>> + * Returns: 0 on success, negative error code on failure.
>>>> + */
>>>> +static inline void
>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>> +{
>>>> +    drm_exec_fini(&vm_exec->exec);
>>>> +}
>>>> +
>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>> +                  struct drm_exec *exec,
>>>> +                  struct dma_fence *fence,
>>>> +                  enum dma_resv_usage private_usage,
>>>> +                  enum dma_resv_usage extobj_usage);
>>>> +
>>>> +/**
>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>> + * @fence: fence to add
>>>> + * @private_usage: private dma-resv usage
>>>> + * @extobj_usage: extobj dma-resv usage
>>>> + *
>>>> + * See drm_gpuvm_resv_add_fence().
>>>> + */
>>>> +static inline void
>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
>>>> +                  struct dma_fence *fence,
>>>> +                  enum dma_resv_usage private_usage,
>>>> +                  enum dma_resv_usage extobj_usage)
>>>> +{
>>>> +    drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
>>>> +                 private_usage, extobj_usage);
>>>> +}
>>>> +
>>>>    /**
>>>>     * struct drm_gpuvm_bo - structure representing a &drm_gpuvm
>>>> and
>>>>     * &drm_gem_object combination
>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>                 * gpuva list.
>>>>                 */
>>>>                struct list_head gem;
>>>> +
>>>> +            /**
>>>> +             * @evict: List entry to attach to the &drm_gpuvms
>>>> +             * extobj list.
>>>> +             */
>>>> +            struct list_head extobj;
>>>> +
>>>> +            /**
>>>> +             * @evict: List entry to attach to the &drm_gpuvms
>>>> evict
>>>> +             * list.
>>>> +             */
>>>> +            struct list_head evict;
>>>>            } entry;
>>>>        } list;
>>>>    };
>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>    drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>              struct drm_gem_object *obj);
>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>> +
>>>>    /**
>>>>     * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of
>>>> &drm_gpuva
>>>>     * @va__: &drm_gpuva structure to assign to in each iteration
>>>> step
>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>         * used.
>>>>         */
>>>>        int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
>>>> +
>>>> +    /**
>>>> +     * @bo_validate: called from drm_gpuvm_validate()
>>>> +     *
>>>> +     * Drivers receive this callback for every evicted
>>>> &drm_gem_object being
>>>> +     * mapped in the corresponding &drm_gpuvm.
>>>> +     *
>>>> +     * Typically, drivers would call their driver specific
>>>> variant of
>>>> +     * ttm_bo_validate() from within this callback.
>>>> +     */
>>>> +    int (*bo_validate)(struct drm_gem_object *obj);
>>>
>>> Same here. Could we have a vm_bo as an argument instead, so that
>>> the callback knows what gpuvm we're targeting and can mark all its
>>> gpu_vas for revalidation? Or is that intended to be done elsewhere?
>>
>> Makes sense as well. I'll change that too.
> 
> I forgot, drm_gpuvm_validate() would preferably take an drm_gpuvm_exec
> argument because we need it in the validate callback. It's also easy
> for the driver to subclass further if needed, to pass even more
> arguments to its validate callback.

Hm.. that implies that a driver open coding the drm_exec loop, still needs
to use a struct drm_gpuvm_exec rather than just a struct drm_exec. What is
this needed for in Xe? Do we expect other drivers needing it? Might a priv
void pointer maybe make more sense?

> 
> /Thomas
> 
> 
>>
>>>
>>>>    };
>>>>    int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>>>
>>> Thanks,
>>>
>>> Thomas
>>>
>>>
>>
>
Thomas Hellstrom Sept. 14, 2023, 7:14 p.m. UTC | #43
On Thu, 2023-09-14 at 19:25 +0200, Danilo Krummrich wrote:
> On 9/14/23 19:21, Thomas Hellström wrote:
> > On Thu, 2023-09-14 at 18:36 +0200, Danilo Krummrich wrote:
> > > On 9/14/23 15:48, Thomas Hellström wrote:
> > > > Hi, Danilo
> > > > 
> > > > Some additional minor comments as xe conversion progresses.
> > > > 
> > > > On 9/9/23 17:31, Danilo Krummrich wrote:
> > > > > So far the DRM GPUVA manager offers common infrastructure to
> > > > > track GPU VA
> > > > > allocations and mappings, generically connect GPU VA mappings
> > > > > to
> > > > > their
> > > > > backing buffers and perform more complex mapping operations
> > > > > on
> > > > > the GPU VA
> > > > > space.
> > > > > 
> > > > > However, there are more design patterns commonly used by
> > > > > drivers,
> > > > > which
> > > > > can potentially be generalized in order to make the DRM GPUVA
> > > > > manager
> > > > > represent a basic GPU-VM implementation. In this context,
> > > > > this
> > > > > patch aims
> > > > > at generalizing the following elements.
> > > > > 
> > > > > 1) Provide a common dma-resv for GEM objects not being used
> > > > > outside of
> > > > >      this GPU-VM.
> > > > > 
> > > > > 2) Provide tracking of external GEM objects (GEM objects
> > > > > which
> > > > > are
> > > > >      shared with other GPU-VMs).
> > > > > 
> > > > > 3) Provide functions to efficiently lock all GEM objects dma-
> > > > > resv
> > > > > the
> > > > >      GPU-VM contains mappings of.
> > > > > 
> > > > > 4) Provide tracking of evicted GEM objects the GPU-VM
> > > > > contains
> > > > > mappings
> > > > >      of, such that validation of evicted GEM objects is
> > > > > accelerated.
> > > > > 
> > > > > 5) Provide some convinience functions for common patterns.
> > > > > 
> > > > > Rather than being designed as a "framework", the target is to
> > > > > make all
> > > > > features appear as a collection of optional helper functions,
> > > > > such that
> > > > > drivers are free to make use of the DRM GPUVA managers basic
> > > > > functionality and opt-in for other features without setting
> > > > > any
> > > > > feature
> > > > > flags, just by making use of the corresponding functions.
> > > > > 
> > > > > Big kudos to Boris Brezillon for his help to figure out
> > > > > locking
> > > > > for drivers
> > > > > updating the GPU VA space within the fence signalling path.
> > > > > 
> > > > > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > > > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > > > ---
> > > > > 
> > > > > +/**
> > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
> > > > > /
> > > > > from a
> > > > > + * &drm_gpuvms evicted list
> > > > > + * @obj: the &drm_gem_object to add or remove
> > > > > + * @evict: indicates whether the object is evicted
> > > > > + *
> > > > > + * Adds a &drm_gem_object to or removes it from all
> > > > > &drm_gpuvms
> > > > > evicted
> > > > > + * list containing a mapping of this &drm_gem_object.
> > > > > + */
> > > > > +void
> > > > > +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
> > > > > +{
> > > > > +    struct drm_gpuvm_bo *vm_bo;
> > > > > +
> > > > > +    drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
> > > > > +        if (evict)
> > > > > +            drm_gpuvm_bo_list_add(vm_bo, evict);
> > > > > +        else
> > > > > +            drm_gpuvm_bo_list_del(vm_bo, evict);
> > > > > +    }
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
> > > > > +
> > > > 
> > > > We need a drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, ...)
> > > > that
> > > > puts a single gpuvm_bo on the list, the above function could
> > > > perhaps be renamed as drm_gpuvm_gem_obj_evict(obj, ....).
> > > 
> > > Makes sense - gonna change that.
> > > 
> > > > 
> > > > Reason is some vm's are faulting vms which don't have an evict
> > > > list, but validate from the pagefault handler. Also evict ==
> > > > false
> > > > is dangerous because if called from within an exec, it might
> > > > remove
> > > > the obj from other vm's evict list before they've had a chance
> > > > to
> > > > rebind their VMAs.
> > > > 
> > > > >    static int
> > > > >    __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
> > > > >               struct drm_gpuva *va)
> > > > > diff --git a/include/drm/drm_gpuvm.h
> > > > > b/include/drm/drm_gpuvm.h
> > > > > index afa50b9059a2..834bb6d6617e 100644
> > > > > --- a/include/drm/drm_gpuvm.h
> > > > > +++ b/include/drm/drm_gpuvm.h
> > > > > @@ -26,10 +26,12 @@
> > > > >     */
> > > > >    #include <linux/list.h>
> > > > > +#include <linux/dma-resv.h>
> > > > >    #include <linux/rbtree.h>
> > > > >    #include <linux/types.h>
> > > > >    #include <drm/drm_gem.h>
> > > > > +#include <drm/drm_exec.h>
> > > > >    struct drm_gpuvm;
> > > > >    struct drm_gpuvm_bo;
> > > > > @@ -259,6 +261,38 @@ struct drm_gpuvm {
> > > > >         * space
> > > > >         */
> > > > >        struct dma_resv *resv;
> > > > > +
> > > > > +    /**
> > > > > +     * @extobj: structure holding the extobj list
> > > > > +     */
> > > > > +    struct {
> > > > > +        /**
> > > > > +         * @list: &list_head storing &drm_gpuvm_bos serving
> > > > > as
> > > > > +         * external object
> > > > > +         */
> > > > > +        struct list_head list;
> > > > > +
> > > > > +        /**
> > > > > +         * @lock: spinlock to protect the extobj list
> > > > > +         */
> > > > > +        spinlock_t lock;
> > > > > +    } extobj;
> > > > > +
> > > > > +    /**
> > > > > +     * @evict: structure holding the evict list and evict
> > > > > list
> > > > > lock
> > > > > +     */
> > > > > +    struct {
> > > > > +        /**
> > > > > +         * @list: &list_head storing &drm_gpuvm_bos
> > > > > currently
> > > > > being
> > > > > +         * evicted
> > > > > +         */
> > > > > +        struct list_head list;
> > > > > +
> > > > > +        /**
> > > > > +         * @lock: spinlock to protect the evict list
> > > > > +         */
> > > > > +        spinlock_t lock;
> > > > > +    } evict;
> > > > >    };
> > > > >    void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
> > > > > drm_device
> > > > > *drm,
> > > > > @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
> > > > > *gpuvm,
> > > > > struct drm_device *drm,
> > > > >                const struct drm_gpuvm_ops *ops);
> > > > >    void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
> > > > > +/**
> > > > > + * drm_gpuvm_is_extobj() - indicates whether the given
> > > > > &drm_gem_object is an
> > > > > + * external object
> > > > > + * @gpuvm: the &drm_gpuvm to check
> > > > > + * @obj: the &drm_gem_object to check
> > > > > + *
> > > > > + * Returns: true if the &drm_gem_object &dma_resv differs
> > > > > from
> > > > > the
> > > > > + * &drm_gpuvms &dma_resv, false otherwise
> > > > > + */
> > > > > +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
> > > > > *gpuvm,
> > > > > +                       struct drm_gem_object *obj)
> > > > > +{
> > > > > +    return obj && obj->resv != gpuvm->resv;
> > > > > +}
> > > > > +
> > > > >    static inline struct drm_gpuva *
> > > > >    __drm_gpuva_next(struct drm_gpuva *va)
> > > > >    {
> > > > > @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
> > > > >    #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
> > > > > \
> > > > >        list_for_each_entry_safe(va__, next__, &(gpuvm__)-
> > > > > >rb.list,
> > > > > rb.entry)
> > > > > +/**
> > > > > + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
> > > > > &drm_exec
> > > > > + *
> > > > > + * This structure should be created on the stack as
> > > > > &drm_exec
> > > > > should be.
> > > > > + *
> > > > > + * Optionally, @extra can be set in order to lock additional
> > > > > &drm_gem_objects.
> > > > > + */
> > > > > +struct drm_gpuvm_exec {
> > > > > +    /**
> > > > > +     * @exec: the &drm_exec structure
> > > > > +     */
> > > > > +    struct drm_exec exec;
> > > > > +
> > > > > +    /**
> > > > > +     * @vm: the &drm_gpuvm to lock its DMA reservations
> > > > > +     */
> > > > > +    struct drm_gpuvm *vm;
> > > > > +
> > > > > +    /**
> > > > > +     * @extra: Callback and corresponding private data for
> > > > > the
> > > > > driver to
> > > > > +     * lock arbitrary additional &drm_gem_objects.
> > > > > +     */
> > > > > +    struct {
> > > > > +        /**
> > > > > +         * @fn: The driver callback to lock additional
> > > > > &drm_gem_objects.
> > > > > +         */
> > > > > +        int (*fn)(struct drm_gpuvm_exec *vm_exec,
> > > > > +              unsigned int num_fences);
> > > > > +
> > > > > +        /**
> > > > > +         * @priv: driver private data for the @fn callback
> > > > > +         */
> > > > > +        void *priv;
> > > > > +    } extra;
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
> > > > > resv
> > > > > + * @gpuvm: the &drm_gpuvm
> > > > > + * @exec: the &drm_exec context
> > > > > + * @num_fences: the amount of &dma_fences to reserve
> > > > > + *
> > > > > + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
> > > > > &drm_gem_object.
> > > > > + *
> > > > > + * Using this function directly, it is the drivers
> > > > > responsibility to call
> > > > > + * drm_exec_init() and drm_exec_fini() accordingly.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +static inline int
> > > > > +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
> > > > > +             struct drm_exec *exec,
> > > > > +             unsigned int num_fences)
> > > > > +{
> > > > > +    return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
> > > > > num_fences);
> > > > > +}
> > > > > +
> > > > > +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> > > > > +                  struct drm_exec *exec,
> > > > > +                  unsigned int num_fences);
> > > > > +
> > > > > +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
> > > > > +                struct drm_exec *exec,
> > > > > +                u64 addr, u64 range,
> > > > > +                unsigned int num_fences);
> > > > > +
> > > > > +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
> > > > > +            unsigned int num_fences,
> > > > > +            bool interruptible);
> > > > > +
> > > > > +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
> > > > > *vm_exec,
> > > > > +                  struct drm_gem_object **objs,
> > > > > +                  unsigned int num_objs,
> > > > > +                  unsigned int num_fences,
> > > > > +                  bool interruptible);
> > > > > +
> > > > > +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
> > > > > *vm_exec,
> > > > > +                  u64 addr, u64 range,
> > > > > +                  unsigned int num_fences,
> > > > > +                  bool interruptible);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
> > > > > BOs
> > > > > + * @gpuvm: the &drm_gpuvm
> > > > > + *
> > > > > + * Releases all dma-resv locks of all &drm_gem_objects
> > > > > previously acquired
> > > > > + * through drm_gpuvm_lock() or its variants.
> > > > > + *
> > > > > + * Returns: 0 on success, negative error code on failure.
> > > > > + */
> > > > > +static inline void
> > > > > +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
> > > > > +{
> > > > > +    drm_exec_fini(&vm_exec->exec);
> > > > > +}
> > > > > +
> > > > > +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
> > > > > +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
> > > > > +                  struct drm_exec *exec,
> > > > > +                  struct dma_fence *fence,
> > > > > +                  enum dma_resv_usage private_usage,
> > > > > +                  enum dma_resv_usage extobj_usage);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuvm_exec_resv_add_fence()
> > > > > + * @vm_exec: the &drm_gpuvm_exec abstraction
> > > > > + * @fence: fence to add
> > > > > + * @private_usage: private dma-resv usage
> > > > > + * @extobj_usage: extobj dma-resv usage
> > > > > + *
> > > > > + * See drm_gpuvm_resv_add_fence().
> > > > > + */
> > > > > +static inline void
> > > > > +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
> > > > > *vm_exec,
> > > > > +                  struct dma_fence *fence,
> > > > > +                  enum dma_resv_usage private_usage,
> > > > > +                  enum dma_resv_usage extobj_usage)
> > > > > +{
> > > > > +    drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
> > > > > fence,
> > > > > +                 private_usage, extobj_usage);
> > > > > +}
> > > > > +
> > > > >    /**
> > > > >     * struct drm_gpuvm_bo - structure representing a
> > > > > &drm_gpuvm
> > > > > and
> > > > >     * &drm_gem_object combination
> > > > > @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
> > > > >                 * gpuva list.
> > > > >                 */
> > > > >                struct list_head gem;
> > > > > +
> > > > > +            /**
> > > > > +             * @evict: List entry to attach to the
> > > > > &drm_gpuvms
> > > > > +             * extobj list.
> > > > > +             */
> > > > > +            struct list_head extobj;
> > > > > +
> > > > > +            /**
> > > > > +             * @evict: List entry to attach to the
> > > > > &drm_gpuvms
> > > > > evict
> > > > > +             * list.
> > > > > +             */
> > > > > +            struct list_head evict;
> > > > >            } entry;
> > > > >        } list;
> > > > >    };
> > > > > @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
> > > > >    drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
> > > > >              struct drm_gem_object *obj);
> > > > > +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
> > > > > evict);
> > > > > +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
> > > > > +
> > > > >    /**
> > > > >     * drm_gpuvm_bo_for_each_va() - iterator to walk over a
> > > > > list of
> > > > > &drm_gpuva
> > > > >     * @va__: &drm_gpuva structure to assign to in each
> > > > > iteration
> > > > > step
> > > > > @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
> > > > >         * used.
> > > > >         */
> > > > >        int (*sm_step_unmap)(struct drm_gpuva_op *op, void
> > > > > *priv);
> > > > > +
> > > > > +    /**
> > > > > +     * @bo_validate: called from drm_gpuvm_validate()
> > > > > +     *
> > > > > +     * Drivers receive this callback for every evicted
> > > > > &drm_gem_object being
> > > > > +     * mapped in the corresponding &drm_gpuvm.
> > > > > +     *
> > > > > +     * Typically, drivers would call their driver specific
> > > > > variant of
> > > > > +     * ttm_bo_validate() from within this callback.
> > > > > +     */
> > > > > +    int (*bo_validate)(struct drm_gem_object *obj);
> > > > 
> > > > Same here. Could we have a vm_bo as an argument instead, so
> > > > that
> > > > the callback knows what gpuvm we're targeting and can mark all
> > > > its
> > > > gpu_vas for revalidation? Or is that intended to be done
> > > > elsewhere?
> > > 
> > > Makes sense as well. I'll change that too.
> > 
> > I forgot, drm_gpuvm_validate() would preferably take an
> > drm_gpuvm_exec
> > argument because we need it in the validate callback. It's also
> > easy
> > for the driver to subclass further if needed, to pass even more
> > arguments to its validate callback.
> 
> Hm.. that implies that a driver open coding the drm_exec loop, still
> needs
> to use a struct drm_gpuvm_exec rather than just a struct drm_exec.
> What is
> this needed for in Xe? Do we expect other drivers needing it? Might a
> priv
> void pointer maybe make more sense?

It's for sleeping locks during eviction rather than trylocks. TTM
currently fishes out the struct ww_acquire_context used for locking
from the lock itself, but I'd expect that to be more explicit in the
near future with a variant of ttm_bo_validate() that actually
explicitly takes a drm_exec as argument.

So we would probably also like to try to find a way to encourage
drivers to include the validate() in the until_all_locked() loop,
because if TTM resorts to a sleeping lock *after* that loop, the
following warning will be hit:

https://elixir.bootlin.com/linux/latest/source/kernel/locking/ww_mutex.h#L195

So not sure what's best, but perhaps then a struct drm_exec * or a 
(void *)

/Thomas


> 
> > 
> > /Thomas
> > 
> > 
> > > 
> > > > 
> > > > >    };
> > > > >    int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
> > > > 
> > > > Thanks,
> > > > 
> > > > Thomas
> > > > 
> > > > 
> > > 
> > 
>
Danilo Krummrich Sept. 18, 2023, 11:21 a.m. UTC | #44
On 9/14/23 19:15, Danilo Krummrich wrote:
> On 9/14/23 19:13, Thomas Hellström wrote:
>> On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote:
>>> On 9/14/23 13:32, Thomas Hellström wrote:
>>>>
>>>> On 9/14/23 12:57, Danilo Krummrich wrote:
>>>>> On 9/13/23 14:16, Danilo Krummrich wrote:
>>>>>
>>>>> <snip>
>>>>>
>>>>>>>> And validate() can remove it while still holding all dma-
>>>>>>>> resv locks,
>>>>>>>> neat!
>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>> concurrently? What
>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>> drm_gpuva_unlink()?
>>>>>>>> Are we guaranteed that at this point of time the
>>>>>>>> drm_gpuvm_bo is not
>>>>>>>> on the
>>>>>>>> evicted list? Because otherwise we would call
>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>> with the
>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>> might drop the last reference to the drm_gem_object and
>>>>>>>> hence we'd
>>>>>>>> potentially
>>>>>>>> free the dma-resv lock while holding it, at least if it's
>>>>>>>> an external
>>>>>>>> object.
>>>>>>>
>>>>>>> Easiest way in this scheme is to think of the lists as being
>>>>>>> protected
>>>>>>> by the vm's resv lock. That means anybody calling unlink()
>>>>>>> must also
>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of
>>>>>>> view, but
>>>>>>> perhaps not from a locking inversion POW from an async list
>>>>>>> update).
>>>>>>
>>>>>> This would mean that on unlink() we'd need to hold the VM's
>>>>>> resv lock and the
>>>>>> corresponding GEM's resv lock (in case they're not the same
>>>>>> anyways) because the
>>>>>> VM's resv lock would protect the external / evicted object
>>>>>> lists and the GEM
>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and
>>>>>> the
>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>
>>>>> As mentioned below the same applies for drm_gpuvm_bo_put() since
>>>>> it might
>>>>> destroy the vm_bo, which includes removing the vm_bo from
>>>>> external / evicted
>>>>> object lists and the GEMs list of vm_bos.
>>>>>
>>>>> As mentioned, if the GEM's dma-resv is different from the VM's
>>>>> dma-resv we need
>>>>> to take both locks. Ultimately, this would mean we need a
>>>>> drm_exec loop, because
>>>>> we can't know the order in which to take these locks. Doing a
>>>>> full drm_exec loop
>>>>> just to put() a vm_bo doesn't sound reasonable to me.
>>>>>
>>>>> Can we instead just have an internal mutex for locking the lists
>>>>> such that we
>>>>> avoid taking and dropping the spinlocks, which we use currently,
>>>>> in a loop?
>>>>
>>>> You'd have the same locking inversion problem with a mutex, right?
>>>> Since in the eviction path you have resv->mutex, from exec you have
>>>> resv->mutex->resv because validate would attempt to grab resv.
>>>
>>> Both lists, evict and extobj, would need to have a separate mutex,
>>> not a common one.
>>> We'd also need a dedicated GEM gpuva lock. Then the only rule would
>>> be that you can't
>>> hold the dma-resv lock when calling put(). Which I admit is not that
>>> nice.
>>>
>>> With the current spinlock solution drivers wouldn't need to worry
>>> about anything locking
>>> related though. So maybe I come back to your proposal of having a
>>> switch for external
>>> locking with dma-resv locks entirely. Such that with external dma-
>>> resv locking I skip
>>> all the spinlocks and add lockdep checks instead.
>>>
>>> I think that makes the most sense in terms of taking advantage of
>>> external dma-resv locking
>>> where possible and on the other hand having a self-contained solution
>>> if not. This should
>>> get all concerns out of the way, yours, Christian's and Boris'.
>>
>> If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock
>> solution, and check back after a while to see if we can remove either
>> option once most pitfalls are hit.
> 
> Sounds good, I'll prepare this for a V4.

I was considering getting rid of the spinlocks using srcu for both external and
evicted objects instead. This would get us rid of taking/dropping the spinlock in
every iteration step of the lists, limiting it to a single srcu_read_{lock,unlock}
call per list walk. Plus, obviously the list_add_rcu() and list_del_rcu() variants
as accessors. The accessors, would probably still need a spinlock to protect against
concurrent list_add_rcu()/list_del_rcu() calls, but I think those are not a concern.

Any concerns from your side with variant?

> 
> - Danilo
> 
>>
>> Thanks,
>> /Thomas
>>
>>
>>>
>>>>
>>>> That said, xe currently indeed does the vm+bo exec dance on vma
>>>> put.
>>>>
>>>> One reason why that seemingly horrible construct is good, is that
>>>> when evicting an extobj and you need to access individual vmas to
>>>> Zap page table entries or TLB flush, those VMAs are not allowed to
>>>> go away (we're not refcounting them). Holding the bo resv on gpuva
>>>> put prevents that from happening. Possibly one could use another
>>>> mutex to protect the gem->vm_bo list to achieve the same, but we'd
>>>> need to hold it on gpuva put.
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>>
>>>>> - Danilo
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For extobjs an outer lock would be enough in case of
>>>>>>>>>> Xe, but I
>>>>>>>>>> really would not
>>>>>>>>>> like to add even more complexity just to get the
>>>>>>>>>> spinlock out of
>>>>>>>>>> the way in case
>>>>>>>>>> the driver already has an outer lock protecting this
>>>>>>>>>> path.
>>>>>>>>>
>>>>>>>>> I must disagree here. These spinlocks and atomic
>>>>>>>>> operations are
>>>>>>>>> pretty
>>>>>>>>> costly and as discussed earlier this type of locking was
>>>>>>>>> the reason
>>>>>>>>> (at
>>>>>>>>> least according to the commit message) that made
>>>>>>>>> Christian drop the
>>>>>>>>> XArray
>>>>>>>>> use in drm_exec for the same set of objects: "The locking
>>>>>>>>> overhead
>>>>>>>>> is
>>>>>>>>> unecessary and measurable". IMHO the spinlock is the
>>>>>>>>> added
>>>>>>>>> complexity and a
>>>>>>>>> single wide lock following the drm locking guidelines set
>>>>>>>>> out by
>>>>>>>>> Daniel and
>>>>>>>>> David should really be the default choice with an opt-in
>>>>>>>>> for a
>>>>>>>>> spinlock if
>>>>>>>>> needed for async and pushing out to a wq is not an
>>>>>>>>> option.
>>>>>>>>
>>>>>>>> For the external object list an outer lock would work as
>>>>>>>> long as it's
>>>>>>>> not the
>>>>>>>> dma-resv lock of the corresponding GEM object, since here
>>>>>>>> we actually
>>>>>>>> need to
>>>>>>>> remove the list entry from the external object list on
>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>> It's just a bit weird design wise that drivers would need
>>>>>>>> to take
>>>>>>>> this outer
>>>>>>>> lock on:
>>>>>>>>
>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also
>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>
>>>>>>>> Given that it seems reasonable to do all the required
>>>>>>>> locking
>>>>>>>> internally.
>>>>>>>
>>>>>>>   From a design POW, there has been a clear direction in XE to
>>>>>>> make
>>>>>>> things similar to mmap() / munmap(), so this outer lock,
>>>>>>> which in Xe is
>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's
>>>>>>> protecting
>>>>>>> the page-table structures and vma rb tree, the userptr
>>>>>>> structures and
>>>>>>> the extobj list. Basically it's taken early in the exec
>>>>>>> IOCTL, the
>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault
>>>>>>> handler, so
>>>>>>> all of the above are just asserting that it is taken in the
>>>>>>> correct
>>>>>>> mode.
>>>>>>>
>>>>>>> But strictly with this scheme one could also use the vm's
>>>>>>> dma_resv for
>>>>>>> the extobj list since with drm_exec, it's locked before
>>>>>>> traversing the
>>>>>>> list.
>>>>>>>
>>>>>>> The whole point of this scheme is to rely on locks that you
>>>>>>> already are
>>>>>>> supposed to be holding for various reasons and is simple to
>>>>>>> comprehend.
>>>>>>
>>>>>> I don't agree that we're supposed to hold the VM's resv lock
>>>>>> anyways for
>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but
>>>>>> I'm fine using it
>>>>>> for that purpose nevertheless.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> In order to at least place lockdep checks, the driver would
>>>>>>>> need to
>>>>>>>> supply the
>>>>>>>> corresponding lock's lockdep_map, because the GPUVM
>>>>>>>> otherwise doesn't
>>>>>>>> know about
>>>>>>>> the lock.
>>>>>>>
>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>
>>>>>> I'd really like to avoid that, especially now that everything
>>>>>> got simpler. We
>>>>>> should define the actual locks to take instead.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Out of curiosity, what is the overhead of a spin_lock()
>>>>>>>> that doesn't
>>>>>>>> need to
>>>>>>>> spin?
>>>>>>>
>>>>>>> I guess it's hard to tell exactly, but it is much lower on
>>>>>>> modern x86
>>>>>>> than what it used to be. Not sure about ARM, which is the
>>>>>>> other
>>>>>>> architecture important to us. I figure if there is little
>>>>>>> cache-line
>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>
>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm
>>>>>>>>> *gpuvm,
>>>>>>>>> spinlock_t
>>>>>>>>> *lock)
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>
>>>>>>>>>       if (!gpuvm->resv_protected_lists)
>>>>>>>>>           spin_lock(lock);
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>>> For such drivers, that would require anybody calling
>>>>>>>>>>> unlink to
>>>>>>>>>>> hold the vm's
>>>>>>>>>>> resv, though.
>>>>>>>>>> In V4 I want to go back to having a dedicated lock for
>>>>>>>>>> the GEMs
>>>>>>>>>> gpuva list (or
>>>>>>>>>> VM_BO list to be more precise). We can't just use the
>>>>>>>>>> dma-resv
>>>>>>>>>> lock for that
>>>>>>>>>> with VM_BO abstractions, because on destruction of a
>>>>>>>>>> VM_BO we
>>>>>>>>>> otherwise wouldn't
>>>>>>>>>> be allowed to already hold the dma-resv lock. That's
>>>>>>>>>> the fix I
>>>>>>>>>> was referring to
>>>>>>>>>> earlier.
>>>>>>>>>
>>>>>>>>> Yeah, I can see the need for a dedicated lock for the
>>>>>>>>> GEM's gpuva
>>>>>>>>> list, but
>>>>>>>>> holding the vm's dma-resv lock across the unlink
>>>>>>>>> shouldn't be a
>>>>>>>>> problem. We
>>>>>>>>> may free the object and a pointer to the vm's resv during
>>>>>>>>> unlink
>>>>>>>>> but we
>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring
>>>>>>>>> that any
>>>>>>>>> calls to
>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>
>>>>>>>> Drivers calling unlink() from the fence signaling path
>>>>>>>> can't use the
>>>>>>>> VM's
>>>>>>>> dma-resv lock.
>>>>>>>
>>>>>>> Yes, that made me a bit curious because in the current
>>>>>>> version the code
>>>>>>> required the object's dma_resv for unlink() which can't be
>>>>>>> grabbed
>>>>>>> either from the fence signaling path. So are there any
>>>>>>> drivers actually
>>>>>>> wanting to do that? If so, they will either need to resort to
>>>>>>> the
>>>>>>> current spinlock solution or they will need to call unlink
>>>>>>> from a
>>>>>>> workqueue item.
>>>>>>
>>>>>> As Boris already mentioned we have the dma-resv lock by default
>>>>>> or a driver
>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the
>>>>>> latter.
>>>>>>
>>>>>>>>
>>>>>>>> Also, what if the object is an external object? We can't
>>>>>>>> use the VM's
>>>>>>>> dma-resv
>>>>>>>> lock here.
>>>>>>>
>>>>>>> Why? Typically (sync) unlink is only ever called from an
>>>>>>> unbind-like
>>>>>>> operation where it should be trivial to grab the vm's resv.
>>>>>>> Or, for
>>>>>>> that matter any outer lock protecting the extobj list. Rule
>>>>>>> would be
>>>>>>> the drm_gpuvm_bo::entry::extobj  and
>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>> be protected by either the vm's dma_resv (or possibly an
>>>>>>> outer lock in
>>>>>>> the case of the extobj list).
>>>>>>
>>>>>> Outer lock wouldn't have been working for updates in the async
>>>>>> path, but
>>>>>> shouldn't be relevant anymore. We could use the VM's resv for
>>>>>> that.
>>>>>>
>>>>>>>
>>>>>>>>    And we can't have the GEM objs dma-resv lock held when
>>>>>>>> calling
>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if
>>>>>>>> the
>>>>>>>> refcount drops
>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and
>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>> drop the
>>>>>>>> last reference of the GEM object.
>>>>>>>
>>>>>>> Yes, but this is a different problem as to what exactly
>>>>>>> protects
>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal
>>>>>>> per bo list
>>>>>>> lock, or if we want to keep the bo's dma_resv we need to
>>>>>>> ensure that
>>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts
>>>>>>> its obj
>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's
>>>>>>> refcount (I know
>>>>>>> Boris didn't like that, but requiring an explicit refcount
>>>>>>> for a
>>>>>>> pointer you dereference unless you're under a lock that
>>>>>>> ensures keeping
>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal
>>>>>>> spinlock)
>>>>>>> I don't have a strong preference.
>>>>>>
>>>>>> We can keep the GEM objects dma-resv lock, however as mentioned
>>>>>> above
>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both
>>>>>> the VM's resv lock
>>>>>> and the GEM's resv lock in case they differ.
>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>
>>>
>>
Christian König Sept. 19, 2023, 12:07 p.m. UTC | #45
Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
> On 9/13/23 17:33, Christian König wrote:
>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>> On 9/13/23 16:26, Christian König wrote:
>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>> As mentioned in a different mail thread, the reply is based on the 
>>>>> assumption
>>>>> that we don't support anything else than GPUVM updates from the 
>>>>> IOCTL.
>>>>
>>>> I think that this assumption is incorrect.
>>>
>>> Well, more precisely I should have said "don't support GPUVM updated 
>>> from within
>>> fence signaling critical sections". And looking at the code, that 
>>> doesn't seem what
>>> you're doing there.
>>>
>>>>
>>>> Vulkan is just once specific use case, but this here should 
>>>> probably be able to handle other use cases as well.
>>>>
>>>> Especially with HMM you get the requirement that you need to be 
>>>> able to invalidate GPUVM mappings without grabbing a reservation lock.
>>>
>>> What do you mean with "invalidate GPUVM mappings" in this context? 
>>> drm_gpuvm_bo_evict()
>>> should only be called from a ttm_device_funcs::move callback, we 
>>> should hold the dma-resv
>>> lock there.
>>
>> Well the question is which dma-resv lock do we hold?
>>
>> In the move callback we only hold the dma-resv lock of the BO which 
>> is moved, but when that is a shared BO then that's not the same as 
>> the one for the VM.
>
> Correct, Thomas' idea was to use the GEM's dma_resv lock to protect 
> drm_gpuvm_bo::evicted
> and then actually move the drm_gpuvm_bo to the VM's evicted list once 
> we grabbed all
> dma-resv locks when locking the VM's BOs using drm_exec. We can remove 
> them from the evicted
> list on validate(). This way we never touch the evicted list without 
> holding at least the VM's
> dma-resv lock.
>
> Do you have any concerns about that?

Scratching my head a bit how that is supposed to work.

This implies that you go over all the evicted BOs during validation and 
not just the one mentioned in the CS.

That might work for Vulkan, but is pretty much a no-go for OpenGL.

>
>>
>>>
>>>>
>>>> See what the eviction lock in amdgpu is doing for example.
>>>
>>> The eviction_lock seems to protect a VM state "evicting" of whether 
>>> any BO that
>>> is associated with the VM is currently evicting. At the same time 
>>> amdgpu protects
>>> the eviceted list of the VM with a different lock. So this seems to 
>>> be entirely
>>> unrelated. Tracking a "currently evicting" state is not part of the 
>>> GPUVM
>>> implementation currently and hence nothing would change for amdgpu 
>>> there.
>>
>> Sorry for the confusion we use different terminology in amdgpu.
>>
>> The eviction lock and evicted state is for the VM page tables, e.g. 
>> if the whole VM is currently not used and swapped out or even 
>> de-allocated.
>>
>> This is necessary because we have cases where we need to access the 
>> VM data without holding the dma-resv lock of this VM. Especially 
>> figuring out which parts of an address space contain mappings and 
>> which doesn't.
>
> I think this is fine, this has nothing to do with lists of evicted GEM 
> objects or external GEM
> objects, right? Marking mappings (drm_gpuva) as invalidated 
> (DRM_GPUVA_INVALIDATED) or accessing
> the VA space does not require any dma-resv locks.

I hope so, but I'm not 100% sure.

>
>>
>> This is a requirement which comes with HMM handling, you won't see 
>> this with Vulkan (or OpenGL, VAAPI etc..).
>>
>>
>> The invalidation lock on the other hand is what in this discussion is 
>> called eviction lock. This one is needed because what I wrote above, 
>> during the move callback only the dma-resv of the BO which is moved 
>> is locked, but not necessarily the dma-resv of the VM.
>
> That's yet another thing, right? This is used to track whether *any* 
> BO that belongs to the VM is
> currently being evicted, correct? As mentioned, as by now this is not 
> supported in GPUVM and hence
> would be the same driver specific code with the same driver specifc lock.

That is most likely a show stopper using this for OpenGL based workloads 
as far as I can see. For those you need to able to figure out which 
non-VM BOs have been evicted and which parts of the VM needs updates.

BTW: Do I got it right that you put the dma_resv object into the VM and 
not into the first GEM object associated with the VM? If yes then that 
would be a circle dependency.

Regards,
Christian.



>
>>
>> Regards,
>> Christian.
>>
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>>>>> Hi!
>>>>>>
>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>
>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>>>>> track GPU VA
>>>>>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>>>>>> to their
>>>>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>>>>> on the GPU VA
>>>>>>>>>>> space.
>>>>>>>>>>>
>>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>>> drivers, which
>>>>>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>>>>>> manager
>>>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>>>> this patch aims
>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>
>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>>>>> outside of
>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>
>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>>> which are
>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>
>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>>>>>> resv the
>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>
>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>> contains mappings
>>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>>> accelerated.
>>>>>>>>>>>
>>>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>>>
>>>>>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>>>>>> make all
>>>>>>>>>>> features appear as a collection of optional helper functions,
>>>>>>>>>>> such that
>>>>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>>>>> any feature
>>>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>>>
>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>> locking for drivers
>>>>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>>>>
>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>> ---
>>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>>>>>> instance of this
>>>>>>>>>>>      * particular combination. If not existent a new instance
>>>>>>>>>>> is created and linked
>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>> + *
>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>>>>> instance the all
>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>>>>> locked by calling
>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>>>> also possible to lock
>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>>>>> loop while making
>>>>>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>>>>>> or
>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>> + *
>>>>>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>      */
>>>>>>>>>>>     /**
>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>> creations and destructions
>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>>>>> + *
>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>>>>> evicted objects are
>>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>>> iteration internally.
>>>>>>>>>>> + *
>>>>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>>>>> calls to functions
>>>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>>>>>> a particular
>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>>>> such as
>>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>>> called with external
>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>> corresponding list to be
>>>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>>>> other API functions.
>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>      */
>>>>>>>>>>>     /**
>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>      *   }
>>>>>>>>>>>      */
>>>>>>>>>>> +/**
>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>>> already iterated items
>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>> + *
>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>>> first element from
>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>> concurrently.
>>>>>>>>>> Are the list spinlocks needed for that async state update from
>>>>>>>>>> within the
>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>
>>>>>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>>>>>> gpuvm's resv
>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>
>>>>>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>>>>>> could we
>>>>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>>>>> allows for)?
>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>>>>> called for. Hence,
>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>>>>>> different BOs.
>>>>>>>> No. Only if you try to add external objects to the vm's evict list
>>>>>>>> from
>>>>>>>> within the evict code. That's not necessary since you loop through
>>>>>>>> all
>>>>>>>> external objects anyway when locking them so an "evicted" bool in
>>>>>>>> the vm_bo,
>>>>>>>> protected by the bo resv would be sufficient. The extobj locking
>>>>>>>> loop can
>>>>>>>> then add the bo to the evicted list.
>>>>>>> And validate() can remove it while still holding all dma-resv 
>>>>>>> locks,
>>>>>>> neat!
>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>> concurrently? What
>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>> drm_gpuva_unlink()?
>>>>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is 
>>>>>>> not
>>>>>>> on the
>>>>>>> evicted list? Because otherwise we would call 
>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>> with the
>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>>>>> potentially
>>>>>>> free the dma-resv lock while holding it, at least if it's an 
>>>>>>> external
>>>>>>> object.
>>>>>> Easiest way in this scheme is to think of the lists as being 
>>>>>> protected
>>>>>> by the vm's resv lock. That means anybody calling unlink() must also
>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>>>>> perhaps not from a locking inversion POW from an async list update).
>>>>> This would mean that on unlink() we'd need to hold the VM's resv 
>>>>> lock and the
>>>>> corresponding GEM's resv lock (in case they're not the same 
>>>>> anyways) because the
>>>>> VM's resv lock would protect the external / evicted object lists 
>>>>> and the GEM
>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>
>>>>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>>>>> really would not
>>>>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>>>>> the way in case
>>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>>>>> pretty
>>>>>>>> costly and as discussed earlier this type of locking was the 
>>>>>>>> reason
>>>>>>>> (at
>>>>>>>> least according to the commit message) that made Christian drop 
>>>>>>>> the
>>>>>>>> XArray
>>>>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>>>>> is
>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>> complexity and a
>>>>>>>> single wide lock following the drm locking guidelines set out by
>>>>>>>> Daniel and
>>>>>>>> David should really be the default choice with an opt-in for a
>>>>>>>> spinlock if
>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>> For the external object list an outer lock would work as long as 
>>>>>>> it's
>>>>>>> not the
>>>>>>> dma-resv lock of the corresponding GEM object, since here we 
>>>>>>> actually
>>>>>>> need to
>>>>>>> remove the list entry from the external object list on
>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>> It's just a bit weird design wise that drivers would need to take
>>>>>>> this outer
>>>>>>> lock on:
>>>>>>>
>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>> drm_gpuvm_bo_put())
>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>
>>>>>>> Given that it seems reasonable to do all the required locking
>>>>>>> internally.
>>>>>>  From a design POW, there has been a clear direction in XE to make
>>>>>> things similar to mmap() / munmap(), so this outer lock, which in 
>>>>>> Xe is
>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>>>>> the page-table structures and vma rb tree, the userptr structures 
>>>>>> and
>>>>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault 
>>>>>> handler, so
>>>>>> all of the above are just asserting that it is taken in the correct
>>>>>> mode.
>>>>>>
>>>>>> But strictly with this scheme one could also use the vm's 
>>>>>> dma_resv for
>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>> traversing the
>>>>>> list.
>>>>>>
>>>>>> The whole point of this scheme is to rely on locks that you 
>>>>>> already are
>>>>>> supposed to be holding for various reasons and is simple to 
>>>>>> comprehend.
>>>>> I don't agree that we're supposed to hold the VM's resv lock 
>>>>> anyways for
>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm 
>>>>> fine using it
>>>>> for that purpose nevertheless.
>>>>>
>>>>>>> In order to at least place lockdep checks, the driver would need to
>>>>>>> supply the
>>>>>>> corresponding lock's lockdep_map, because the GPUVM otherwise 
>>>>>>> doesn't
>>>>>>> know about
>>>>>>> the lock.
>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>> I'd really like to avoid that, especially now that everything got 
>>>>> simpler. We
>>>>> should define the actual locks to take instead.
>>>>>
>>>>>>> Out of curiosity, what is the overhead of a spin_lock() that 
>>>>>>> doesn't
>>>>>>> need to
>>>>>>> spin?
>>>>>> I guess it's hard to tell exactly, but it is much lower on modern 
>>>>>> x86
>>>>>> than what it used to be. Not sure about ARM, which is the other
>>>>>> architecture important to us. I figure if there is little cache-line
>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>
>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>
>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>>>>> spinlock_t
>>>>>>>> *lock)
>>>>>>>>
>>>>>>>> {
>>>>>>>>
>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>          spin_lock(lock);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>>>>> hold the vm's
>>>>>>>>>> resv, though.
>>>>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>>>>> gpuva list (or
>>>>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>>>>> lock for that
>>>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>>>> otherwise wouldn't
>>>>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>>>>> was referring to
>>>>>>>>> earlier.
>>>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>>>>> list, but
>>>>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>>>>> problem. We
>>>>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>>>>> but we
>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>>>>> calls to
>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>> Drivers calling unlink() from the fence signaling path can't use 
>>>>>>> the
>>>>>>> VM's
>>>>>>> dma-resv lock.
>>>>>> Yes, that made me a bit curious because in the current version 
>>>>>> the code
>>>>>> required the object's dma_resv for unlink() which can't be grabbed
>>>>>> either from the fence signaling path. So are there any drivers 
>>>>>> actually
>>>>>> wanting to do that? If so, they will either need to resort to the
>>>>>> current spinlock solution or they will need to call unlink from a
>>>>>> workqueue item.
>>>>> As Boris already mentioned we have the dma-resv lock by default or 
>>>>> a driver
>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>>>>
>>>>>>> Also, what if the object is an external object? We can't use the 
>>>>>>> VM's
>>>>>>> dma-resv
>>>>>>> lock here.
>>>>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>>>>> operation where it should be trivial to grab the vm's resv. Or, for
>>>>>> that matter any outer lock protecting the extobj list. Rule would be
>>>>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict 
>>>>>> would
>>>>>> be protected by either the vm's dma_resv (or possibly an outer 
>>>>>> lock in
>>>>>> the case of the extobj list).
>>>>> Outer lock wouldn't have been working for updates in the async 
>>>>> path, but
>>>>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>>>>
>>>>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>>>> refcount drops
>>>>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() 
>>>>>>> might
>>>>>>> drop the
>>>>>>> last reference of the GEM object.
>>>>>> Yes, but this is a different problem as to what exactly protects
>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per 
>>>>>> bo list
>>>>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount 
>>>>>> (I know
>>>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>>>> pointer you dereference unless you're under a lock that ensures 
>>>>>> keeping
>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal 
>>>>>> spinlock)
>>>>>> I don't have a strong preference.
>>>>> We can keep the GEM objects dma-resv lock, however as mentioned above
>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the 
>>>>> VM's resv lock
>>>>> and the GEM's resv lock in case they differ.
>>>>>
>>>>>>>   All those problems go away with a dedicated
>>>>>>> GEM gpuva list lock.
>>>>>> I don't think these are real problems.
>>>>>> With the excepton of the eviction list "trick" where we currently 
>>>>>> have
>>>>>> slightly different approach to collect external bos needing 
>>>>>> rebinding,
>>>>>> we have this working fine.
>>>>>>
>>>>>> TBH I think pretty much the only situation where the spinlock is 
>>>>>> needed
>>>>>> is for async updates of these lists, unless a wq item can be used 
>>>>>> for
>>>>>> that, but it doesn't really seem like the current code allows for 
>>>>>> such
>>>>>> updates anyway? It complicates the code a lot, adds overhead and 
>>>>>> also
>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>>> /Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>>>>> atomic.
>>>>>>>>>>
>>>>>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>>>>>> when
>>>>>>>>>> possible".
>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>> locking inversion?
>>>>>>>>>>
>>>>>>>>>> /Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> + *
>>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>>> local list, so removal
>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>> iterating the list.
>>>>>>>>>>> + */
>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>> +       ({
>>>>>>>>>>>                             \
>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>> +
>>>>>>>>>>>                             \
>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>                             \
>>>>>>>>>>> +
>>>>>>>>>>>                             \
>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>> + struct
>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>> +
>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>> +                       if
>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>> {                    \
>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>> +
>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>> +                               break;
>>>>>>>>>>>                             \
>>>>>>>>>>> +                       } else
>>>>>>>>>>> {                                                        \
>>>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>> +                       }
>>>>>>>>>>>                             \
>>>>>>>>>>> +               }
>>>>>>>>>>>                             \
>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>> +
>>>>>>>>>>>                             \
>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>                             \
>>>>>>>>>>> +       })
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>>>>> + *
>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>>> first element from the
>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>> concurrently.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Typical use:
>>>>>>>>>>> + *
>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>> + *
>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>> + *                     break;
>>>>>>>>>>> + *     }
>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>>> &my_local_list);
>>>>>>>>>>> + *
>>>>>>>>>>> + *
>>>>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>>>>> exposed to the outside
>>>>>>>>>>> + * world.
>>>>>>>>>>> + */
>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>> __list_name,           \
>>>>>>>>>>> +                                               __local_list,
>>>>>>>>>>> NULL);            \
>>>>>>>>>>> +
>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>        \
>>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>> __list_name,           \
>>>>>>>>>>> +                                               __local_list,
>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>>>>> original list
>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>>> already iterated items
>>>>>>>>>>> + *
>>>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>> + * to restore the original state and let new iterations take
>>>>>>>>>>> place.
>>>>>>>>>>> + */
>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>> +       do
>>>>>>>>>>> {
>>>>>>>>>>>                  \
>>>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>>>> list elements to the          \
>>>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>>>> case it matters.              \
>>>>>>>>>>> +
>>>>>>>>>>> */
>>>>>>>>>>>            \
>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>> +       } while (0)
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>>>>> list
>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>> + *
>>>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>>>> @__list_name and
>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>> + */
>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>> +       do
>>>>>>>>>>> {
>>>>>>>>>>>          \
>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>> +       } while (0)
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>>>>> list
>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>> + *
>>>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>>>> @__list_name and
>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>> + */
>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>> +       do
>>>>>>>>>>> {
>>>>>>>>>>>          \
>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>> +       } while (0)
>>>>>>>>>>> +
>>>>>>>>>>> +static int __must_check
>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>> +
>>>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>> +
>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>> +
>>>>>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>>>>> memory.\n");
>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>> +
>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>     }
>>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>> + *
>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>>>>>> given
>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>> responsibility to call
>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>>>>> and removal of
>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>>>>>> within the
>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>> + */
>>>>>>>>>>> +int
>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>> +
>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>> vm_bo) {
>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>>>>> num_fences);
>>>>>>>>>>> +               if (ret)
>>>>>>>>>>> +                       break;
>>>>>>>>>>> +       }
>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>> +
>>>>>>>>>>> +       return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>>>>>> a given range
>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>> + *
>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>>>> mapped between @addr
>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>> + */
>>>>>>>>>>> +int
>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>>>> num_fences)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>> +       int ret;
>>>>>>>>>>> +
>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>>>> +
>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>> num_fences);
>>>>>>>>>>> +               if (ret)
>>>>>>>>>>> +                       return ret;
>>>>>>>>>>> +       }
>>>>>>>>>>> +
>>>>>>>>>>> +       return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>> + *
>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>> given
>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>>>>> lock additional
>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>>>>> Typically, drivers
>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>> callback.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>> + */
>>>>>>>>>>> +int
>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>> +       int ret;
>>>>>>>>>>> +
>>>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>> 0 |
>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>> +
>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>> +
>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>> num_fences);
>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>> +               if (ret)
>>>>>>>>>>> +                       goto err;
>>>>>>>>>>> +
>>>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>> num_fences);
>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>> +               if (ret)
>>>>>>>>>>> +                       goto err;
>>>>>>>>>>> +
>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>>> num_fences);
>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>> +                               goto err;
>>>>>>>>>>> +               }
>>>>>>>>>>> +       }
>>>>>>>>>>> +
>>>>>>>>>>> +       return 0;
>>>>>>>>>>> +
>>>>>>>>>>> +err:
>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>> +       return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>> +
>>>>>>>>>>> +static int
>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>>>>> num_fences)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct {
>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>> +
>>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>> objs,
>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>> num_fences);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>>>>> lock
>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>> + *
>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>> + */
>>>>>>>>>>> +int
>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct {
>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>> +       } args;
>>>>>>>>>>> +
>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>> +
>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>> +
>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>> interruptible);
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>>> within a given range
>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>> + *
>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>> + */
>>>>>>>>>>> +int
>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>> +       int ret;
>>>>>>>>>>> +
>>>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>> 0 |
>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>> +
>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>> +
>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>> addr, range,
>>>>>>>>>>> + num_fences);
>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>> +               if (ret)
>>>>>>>>>>> +                       goto err;
>>>>>>>>>>> +       }
>>>>>>>>>>> +
>>>>>>>>>>> +       return ret;
>>>>>>>>>>> +
>>>>>>>>>>> +err:
>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>> +       return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>> + *
>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>>>> evicted buffer
>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>> + */
>>>>>>>>>>> +int
>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>> +{
>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>> +
>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>> +
>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>> +               if (ret)
>>>>>>>>>>> +                       break;
>>>>>>>>>>> +       }
>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>> +
>>>>>>>>>>> +       return ret;
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>>>>> extobj
>>>>>>>>>>> + * dma-resv
>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>> + */
>>>>>>>>>>> +void
>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>> +
>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>> obj) ?
>>>>>>>>>>> + private_usage :
>>>>>>>>>>> extobj_usage);
>>>>>>>>>>> +       }
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>> +
>>>>>>>>>>>     /**
>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>> +
>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>> +
>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>> +
>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>>>>      *
>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>> + *
>>>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>>>> destroyed, which
>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>>>>> a call to this
>>>>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>>>>> the caller must
>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>>      */
>>>>>>>>>>>     void
>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>     }
>>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>> +static int __must_check
>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>> +{
>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>     }
>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>> + * extobj list
>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>>>>> extobj list.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>>>>> not on the list
>>>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>>>> external object,
>>>>>>>>>>> + * actually.
>>>>>>>>>>> + */
>>>>>>>>>>> +void
>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>> +
>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>>>>> / from a
>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>> + *
>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>> + */
>>>>>>>>>>> +void
>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>>>>> +{
>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>> +
>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>> +               if (evict)
>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>>>>> +               else
>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>>>>> +       }
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>> +
>>>>>>>>>>>     static int
>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>      */
>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>           * space
>>>>>>>>>>>           */
>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>> +
>>>>>>>>>>> +       /**
>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>> +        */
>>>>>>>>>>> +       struct {
>>>>>>>>>>> +               /**
>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>> serving as
>>>>>>>>>>> +                * external object
>>>>>>>>>>> +                */
>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>> +
>>>>>>>>>>> +               /**
>>>>>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>>>>>> +                */
>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>> +       } extobj;
>>>>>>>>>>> +
>>>>>>>>>>> +       /**
>>>>>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>>>>>> list lock
>>>>>>>>>>> +        */
>>>>>>>>>>> +       struct {
>>>>>>>>>>> +               /**
>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>> currently being
>>>>>>>>>>> +                * evicted
>>>>>>>>>>> +                */
>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>> +
>>>>>>>>>>> +               /**
>>>>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>>>>> +                */
>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>> +       } evict;
>>>>>>>>>>>     };
>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>> + * external object
>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>>>> from the
>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>> + */
>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>>> *gpuvm,
>>>>>>>>>>> +                                      struct drm_gem_object
>>>>>>>>>>> *obj)
>>>>>>>>>>> +{
>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>     {
>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>>>>>> \
>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>> +/**
>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>> &drm_exec
>>>>>>>>>>> + *
>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>> + */
>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>> +       /**
>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>> +        */
>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>> +
>>>>>>>>>>> +       /**
>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>>>>> +        */
>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>> +
>>>>>>>>>>> +       /**
>>>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>>>> for the driver to
>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>> +        */
>>>>>>>>>>> +       struct {
>>>>>>>>>>> +               /**
>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>> +                */
>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>>>> +
>>>>>>>>>>> +               /**
>>>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>>>> callback
>>>>>>>>>>> +                */
>>>>>>>>>>> +               void *priv;
>>>>>>>>>>> +       } extra;
>>>>>>>>>>> +};
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>>>>> resv
>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>> + *
>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>> responsibility to call
>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>> + */
>>>>>>>>>>> +static inline int
>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>> +{
>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>>> num_fences);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>> +                             unsigned int num_fences);
>>>>>>>>>>> +
>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>>>> +
>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>> +
>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>> *vm_exec,
>>>>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>>>>> +                             unsigned int num_objs,
>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>> +
>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>> *vm_exec,
>>>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>>>>> BOs
>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>> + *
>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>> previously acquired
>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>> + *
>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>> + */
>>>>>>>>>>> +static inline void
>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>> +{
>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>> private_usage,
>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>> extobj_usage);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>> + *
>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>> + */
>>>>>>>>>>> +static inline void
>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>> *vm_exec,
>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>> private_usage,
>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>> extobj_usage)
>>>>>>>>>>> +{
>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>>>>>> fence,
>>>>>>>>>>> + private_usage,
>>>>>>>>>>> extobj_usage);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>     /**
>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>                           */
>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>> +
>>>>>>>>>>> +                       /**
>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>> +                        */
>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>> +
>>>>>>>>>>> +                       /**
>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>> +                        * list.
>>>>>>>>>>> +                        */
>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>                  } entry;
>>>>>>>>>>>          } list;
>>>>>>>>>>>     };
>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>>>> evict);
>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>> +
>>>>>>>>>>>     /**
>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>> iteration step
>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>           * used.
>>>>>>>>>>>           */
>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>>>>> *priv);
>>>>>>>>>>> +
>>>>>>>>>>> +       /**
>>>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>>>> +        *
>>>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>> +        *
>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>> specific variant of
>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>> +        */
>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>     };
>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>>>>
>>>
>>
>
Thomas Hellstrom Sept. 19, 2023, 12:21 p.m. UTC | #46
Hi Christian

On 9/19/23 14:07, Christian König wrote:
> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>> On 9/13/23 17:33, Christian König wrote:
>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>> On 9/13/23 16:26, Christian König wrote:
>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>> As mentioned in a different mail thread, the reply is based on 
>>>>>> the assumption
>>>>>> that we don't support anything else than GPUVM updates from the 
>>>>>> IOCTL.
>>>>>
>>>>> I think that this assumption is incorrect.
>>>>
>>>> Well, more precisely I should have said "don't support GPUVM 
>>>> updated from within
>>>> fence signaling critical sections". And looking at the code, that 
>>>> doesn't seem what
>>>> you're doing there.
>>>>
>>>>>
>>>>> Vulkan is just once specific use case, but this here should 
>>>>> probably be able to handle other use cases as well.
>>>>>
>>>>> Especially with HMM you get the requirement that you need to be 
>>>>> able to invalidate GPUVM mappings without grabbing a reservation 
>>>>> lock.
>>>>
>>>> What do you mean with "invalidate GPUVM mappings" in this context? 
>>>> drm_gpuvm_bo_evict()
>>>> should only be called from a ttm_device_funcs::move callback, we 
>>>> should hold the dma-resv
>>>> lock there.
>>>
>>> Well the question is which dma-resv lock do we hold?
>>>
>>> In the move callback we only hold the dma-resv lock of the BO which 
>>> is moved, but when that is a shared BO then that's not the same as 
>>> the one for the VM.
>>
>> Correct, Thomas' idea was to use the GEM's dma_resv lock to protect 
>> drm_gpuvm_bo::evicted
>> and then actually move the drm_gpuvm_bo to the VM's evicted list once 
>> we grabbed all
>> dma-resv locks when locking the VM's BOs using drm_exec. We can 
>> remove them from the evicted
>> list on validate(). This way we never touch the evicted list without 
>> holding at least the VM's
>> dma-resv lock.
>>
>> Do you have any concerns about that?
>
> Scratching my head a bit how that is supposed to work.
>
> This implies that you go over all the evicted BOs during validation 
> and not just the one mentioned in the CS.
>
> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>
>>
>>>
>>>>
>>>>>
>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>
>>>> The eviction_lock seems to protect a VM state "evicting" of whether 
>>>> any BO that
>>>> is associated with the VM is currently evicting. At the same time 
>>>> amdgpu protects
>>>> the eviceted list of the VM with a different lock. So this seems to 
>>>> be entirely
>>>> unrelated. Tracking a "currently evicting" state is not part of the 
>>>> GPUVM
>>>> implementation currently and hence nothing would change for amdgpu 
>>>> there.
>>>
>>> Sorry for the confusion we use different terminology in amdgpu.
>>>
>>> The eviction lock and evicted state is for the VM page tables, e.g. 
>>> if the whole VM is currently not used and swapped out or even 
>>> de-allocated.
>>>
>>> This is necessary because we have cases where we need to access the 
>>> VM data without holding the dma-resv lock of this VM. Especially 
>>> figuring out which parts of an address space contain mappings and 
>>> which doesn't.
>>
>> I think this is fine, this has nothing to do with lists of evicted 
>> GEM objects or external GEM
>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>> (DRM_GPUVA_INVALIDATED) or accessing
>> the VA space does not require any dma-resv locks.
>
> I hope so, but I'm not 100% sure.
>
>>
>>>
>>> This is a requirement which comes with HMM handling, you won't see 
>>> this with Vulkan (or OpenGL, VAAPI etc..).
>>>
>>>
>>> The invalidation lock on the other hand is what in this discussion 
>>> is called eviction lock. This one is needed because what I wrote 
>>> above, during the move callback only the dma-resv of the BO which is 
>>> moved is locked, but not necessarily the dma-resv of the VM.
>>
>> That's yet another thing, right? This is used to track whether *any* 
>> BO that belongs to the VM is
>> currently being evicted, correct? As mentioned, as by now this is not 
>> supported in GPUVM and hence
>> would be the same driver specific code with the same driver specifc 
>> lock.
>
> That is most likely a show stopper using this for OpenGL based 
> workloads as far as I can see. For those you need to able to figure 
> out which non-VM BOs have been evicted and which parts of the VM needs 
> updates.

We identify those with a bool in the gpuvm_bo, and that bool is 
protected by the bo_resv. In essence, the "evicted" list must be made 
up-to-date with all relevant locks held before traversing in the next exec.

If you mean that we need to unbind all vmas of all vms of evicted bos 
before evicting, We don't do that, at least not in Xe, since evicting we 
wait for VM idle, and it cant access anything through the stale vmas 
until they have been revalidated and rebound.

/Thomas



>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström 
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>
>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>>>>>>> to their
>>>>>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>> space.
>>>>>>>>>>>>
>>>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>>>> drivers, which
>>>>>>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>>>>>>> manager
>>>>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>>>>> this patch aims
>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>>>>>> outside of
>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>>>> which are
>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>
>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>>>>>>> resv the
>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>
>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>
>>>>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>>>>
>>>>>>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>>>>>>> make all
>>>>>>>>>>>> features appear as a collection of optional helper functions,
>>>>>>>>>>>> such that
>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>>>>>> any feature
>>>>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>>>>
>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>>>>>
>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>>>>>>> instance of this
>>>>>>>>>>>>      * particular combination. If not existent a new instance
>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>>>>>> instance the all
>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>>>>>> locked by calling
>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>>>>>> loop while making
>>>>>>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>>>>>>> or
>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>      */
>>>>>>>>>>>>     /**
>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>>>>>> calls to functions
>>>>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>>>>>>> a particular
>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>>>>> such as
>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>>>> called with external
>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>>>>> other API functions.
>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>      */
>>>>>>>>>>>>     /**
>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>      *   }
>>>>>>>>>>>>      */
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>>>> already iterated items
>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>>>> first element from
>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>> concurrently.
>>>>>>>>>>> Are the list spinlocks needed for that async state update from
>>>>>>>>>>> within the
>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>
>>>>>>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>
>>>>>>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>>>>>>> could we
>>>>>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>>>>>> allows for)?
>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>>>>>> called for. Hence,
>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>>>>>>> different BOs.
>>>>>>>>> No. Only if you try to add external objects to the vm's evict 
>>>>>>>>> list
>>>>>>>>> from
>>>>>>>>> within the evict code. That's not necessary since you loop 
>>>>>>>>> through
>>>>>>>>> all
>>>>>>>>> external objects anyway when locking them so an "evicted" bool in
>>>>>>>>> the vm_bo,
>>>>>>>>> protected by the bo resv would be sufficient. The extobj locking
>>>>>>>>> loop can
>>>>>>>>> then add the bo to the evicted list.
>>>>>>>> And validate() can remove it while still holding all dma-resv 
>>>>>>>> locks,
>>>>>>>> neat!
>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>> concurrently? What
>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>> drm_gpuva_unlink()?
>>>>>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo 
>>>>>>>> is not
>>>>>>>> on the
>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>> with the
>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>>>>>> potentially
>>>>>>>> free the dma-resv lock while holding it, at least if it's an 
>>>>>>>> external
>>>>>>>> object.
>>>>>>> Easiest way in this scheme is to think of the lists as being 
>>>>>>> protected
>>>>>>> by the vm's resv lock. That means anybody calling unlink() must 
>>>>>>> also
>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of view, 
>>>>>>> but
>>>>>>> perhaps not from a locking inversion POW from an async list 
>>>>>>> update).
>>>>>> This would mean that on unlink() we'd need to hold the VM's resv 
>>>>>> lock and the
>>>>>> corresponding GEM's resv lock (in case they're not the same 
>>>>>> anyways) because the
>>>>>> VM's resv lock would protect the external / evicted object lists 
>>>>>> and the GEM
>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>
>>>>>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>>>>>> really would not
>>>>>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>>>>>> the way in case
>>>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>>>>>> pretty
>>>>>>>>> costly and as discussed earlier this type of locking was the 
>>>>>>>>> reason
>>>>>>>>> (at
>>>>>>>>> least according to the commit message) that made Christian 
>>>>>>>>> drop the
>>>>>>>>> XArray
>>>>>>>>> use in drm_exec for the same set of objects: "The locking 
>>>>>>>>> overhead
>>>>>>>>> is
>>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>>> complexity and a
>>>>>>>>> single wide lock following the drm locking guidelines set out by
>>>>>>>>> Daniel and
>>>>>>>>> David should really be the default choice with an opt-in for a
>>>>>>>>> spinlock if
>>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>>> For the external object list an outer lock would work as long 
>>>>>>>> as it's
>>>>>>>> not the
>>>>>>>> dma-resv lock of the corresponding GEM object, since here we 
>>>>>>>> actually
>>>>>>>> need to
>>>>>>>> remove the list entry from the external object list on
>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>> It's just a bit weird design wise that drivers would need to take
>>>>>>>> this outer
>>>>>>>> lock on:
>>>>>>>>
>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>
>>>>>>>> Given that it seems reasonable to do all the required locking
>>>>>>>> internally.
>>>>>>>  From a design POW, there has been a clear direction in XE to make
>>>>>>> things similar to mmap() / munmap(), so this outer lock, which 
>>>>>>> in Xe is
>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's 
>>>>>>> protecting
>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>> structures and
>>>>>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault 
>>>>>>> handler, so
>>>>>>> all of the above are just asserting that it is taken in the correct
>>>>>>> mode.
>>>>>>>
>>>>>>> But strictly with this scheme one could also use the vm's 
>>>>>>> dma_resv for
>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>> traversing the
>>>>>>> list.
>>>>>>>
>>>>>>> The whole point of this scheme is to rely on locks that you 
>>>>>>> already are
>>>>>>> supposed to be holding for various reasons and is simple to 
>>>>>>> comprehend.
>>>>>> I don't agree that we're supposed to hold the VM's resv lock 
>>>>>> anyways for
>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm 
>>>>>> fine using it
>>>>>> for that purpose nevertheless.
>>>>>>
>>>>>>>> In order to at least place lockdep checks, the driver would 
>>>>>>>> need to
>>>>>>>> supply the
>>>>>>>> corresponding lock's lockdep_map, because the GPUVM otherwise 
>>>>>>>> doesn't
>>>>>>>> know about
>>>>>>>> the lock.
>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>> I'd really like to avoid that, especially now that everything got 
>>>>>> simpler. We
>>>>>> should define the actual locks to take instead.
>>>>>>
>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() that 
>>>>>>>> doesn't
>>>>>>>> need to
>>>>>>>> spin?
>>>>>>> I guess it's hard to tell exactly, but it is much lower on 
>>>>>>> modern x86
>>>>>>> than what it used to be. Not sure about ARM, which is the other
>>>>>>> architecture important to us. I figure if there is little 
>>>>>>> cache-line
>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>
>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>
>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>>>>>> spinlock_t
>>>>>>>>> *lock)
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>
>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>          spin_lock(lock);
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>>>>>> hold the vm's
>>>>>>>>>>> resv, though.
>>>>>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>>>>>> gpuva list (or
>>>>>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>>>>>> lock for that
>>>>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>>>>> otherwise wouldn't
>>>>>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>>>>>> was referring to
>>>>>>>>>> earlier.
>>>>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>>>>>> list, but
>>>>>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>>>>>> problem. We
>>>>>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>>>>>> but we
>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>>>>>> calls to
>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>> Drivers calling unlink() from the fence signaling path can't 
>>>>>>>> use the
>>>>>>>> VM's
>>>>>>>> dma-resv lock.
>>>>>>> Yes, that made me a bit curious because in the current version 
>>>>>>> the code
>>>>>>> required the object's dma_resv for unlink() which can't be grabbed
>>>>>>> either from the fence signaling path. So are there any drivers 
>>>>>>> actually
>>>>>>> wanting to do that? If so, they will either need to resort to the
>>>>>>> current spinlock solution or they will need to call unlink from a
>>>>>>> workqueue item.
>>>>>> As Boris already mentioned we have the dma-resv lock by default 
>>>>>> or a driver
>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the 
>>>>>> latter.
>>>>>>
>>>>>>>> Also, what if the object is an external object? We can't use 
>>>>>>>> the VM's
>>>>>>>> dma-resv
>>>>>>>> lock here.
>>>>>>> Why? Typically (sync) unlink is only ever called from an 
>>>>>>> unbind-like
>>>>>>> operation where it should be trivial to grab the vm's resv. Or, for
>>>>>>> that matter any outer lock protecting the extobj list. Rule 
>>>>>>> would be
>>>>>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict 
>>>>>>> would
>>>>>>> be protected by either the vm's dma_resv (or possibly an outer 
>>>>>>> lock in
>>>>>>> the case of the extobj list).
>>>>>> Outer lock wouldn't have been working for updates in the async 
>>>>>> path, but
>>>>>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>>>>>
>>>>>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>>>>> refcount drops
>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() 
>>>>>>>> might
>>>>>>>> drop the
>>>>>>>> last reference of the GEM object.
>>>>>>> Yes, but this is a different problem as to what exactly protects
>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per 
>>>>>>> bo list
>>>>>>> lock, or if we want to keep the bo's dma_resv we need to ensure 
>>>>>>> that
>>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount 
>>>>>>> (I know
>>>>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>>>>> pointer you dereference unless you're under a lock that ensures 
>>>>>>> keeping
>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal 
>>>>>>> spinlock)
>>>>>>> I don't have a strong preference.
>>>>>> We can keep the GEM objects dma-resv lock, however as mentioned 
>>>>>> above
>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the 
>>>>>> VM's resv lock
>>>>>> and the GEM's resv lock in case they differ.
>>>>>>
>>>>>>>>   All those problems go away with a dedicated
>>>>>>>> GEM gpuva list lock.
>>>>>>> I don't think these are real problems.
>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>> currently have
>>>>>>> slightly different approach to collect external bos needing 
>>>>>>> rebinding,
>>>>>>> we have this working fine.
>>>>>>>
>>>>>>> TBH I think pretty much the only situation where the spinlock is 
>>>>>>> needed
>>>>>>> is for async updates of these lists, unless a wq item can be 
>>>>>>> used for
>>>>>>> that, but it doesn't really seem like the current code allows 
>>>>>>> for such
>>>>>>> updates anyway? It complicates the code a lot, adds overhead and 
>>>>>>> also
>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>>> /Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>>>>>> atomic.
>>>>>>>>>>>
>>>>>>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>>>>>>> when
>>>>>>>>>>> possible".
>>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>>> locking inversion?
>>>>>>>>>>>
>>>>>>>>>>> /Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>> +       ({
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>> +
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>>> + struct
>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>> +
>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>> +                       if
>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>> {                    \
>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>> +
>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +                       } else
>>>>>>>>>>>> {                                                        \
>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>> +                       }
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +               }
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>> +
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>                             \
>>>>>>>>>>>> +       })
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>>>> first element from the
>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>> concurrently.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>> + *
>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>> + *
>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>> + *     }
>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>> + *
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>> + * world.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>> +                                               __local_list,
>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>> +
>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>        \
>>>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>> +                                               __local_list,
>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>>>>>> original list
>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>>>> already iterated items
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>> + * to restore the original state and let new iterations take
>>>>>>>>>>>> place.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>> +       do
>>>>>>>>>>>> {
>>>>>>>>>>>>                  \
>>>>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>> +
>>>>>>>>>>>> */
>>>>>>>>>>>>            \
>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>>>>>> list
>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>> +       do
>>>>>>>>>>>> {
>>>>>>>>>>>>          \
>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>>>>>> list
>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>> +       do
>>>>>>>>>>>> {
>>>>>>>>>>>>          \
>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>> +
>>>>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>> +
>>>>>>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>> +
>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>     }
>>>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>>>>>>> given
>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>>>>>> and removal of
>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>>>>>>> within the
>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +int
>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>>>>>> num_fences);
>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>> +                       break;
>>>>>>>>>>>> +       }
>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>> +
>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>>>>>>> a given range
>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +int
>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>>>>> num_fences)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>>>>> +
>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>> num_fences);
>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>> +       }
>>>>>>>>>>>> +
>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>>> given
>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>>>>>> lock additional
>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>> callback.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +int
>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>> 0 |
>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>> +
>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>>> num_fences);
>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>> +
>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>>> num_fences);
>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>> +
>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>> num_fences);
>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>> +               }
>>>>>>>>>>>> +       }
>>>>>>>>>>>> +
>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +err:
>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int
>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>>>>>> num_fences)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct {
>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>>> objs,
>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>> num_fences);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>>>>>> lock
>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +int
>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct {
>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>> +       } args;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>>> interruptible);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>>>> within a given range
>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +int
>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>> 0 |
>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>> +
>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>>> addr, range,
>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>> +       }
>>>>>>>>>>>> +
>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>> +
>>>>>>>>>>>> +err:
>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +int
>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>> +                       break;
>>>>>>>>>>>> +       }
>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>> +
>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>>>>>> extobj
>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>> + */
>>>>>>>>>>>> +void
>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>> obj) ?
>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>> +       }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>> +
>>>>>>>>>>>>     /**
>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>> +
>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>> +
>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>>>>>      *
>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>>>>>> a call to this
>>>>>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>>>>>> the caller must
>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>>>      */
>>>>>>>>>>>>     void
>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>     }
>>>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>     }
>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>>>>>> extobj list.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>>>>>> not on the list
>>>>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>>>>> external object,
>>>>>>>>>>>> + * actually.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +void
>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>>>>>> / from a
>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +void
>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>>>>>> +               else
>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>>>>>> +       }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>> +
>>>>>>>>>>>>     static int
>>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>      */
>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>           * space
>>>>>>>>>>>>           */
>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       /**
>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>> +        */
>>>>>>>>>>>> +       struct {
>>>>>>>>>>>> +               /**
>>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>>> serving as
>>>>>>>>>>>> +                * external object
>>>>>>>>>>>> +                */
>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>> +
>>>>>>>>>>>> +               /**
>>>>>>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>>>>>>> +                */
>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       /**
>>>>>>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>>>>>>> list lock
>>>>>>>>>>>> +        */
>>>>>>>>>>>> +       struct {
>>>>>>>>>>>> +               /**
>>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>>> currently being
>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>> +                */
>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>> +
>>>>>>>>>>>> +               /**
>>>>>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>>>>>> +                */
>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>     };
>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>> + * external object
>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>>>>> from the
>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>> + */
>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>> *obj)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>     {
>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>>>>>>> \
>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>> +       /**
>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>> +        */
>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       /**
>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>>>>>> +        */
>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       /**
>>>>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>>>>> for the driver to
>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>> +        */
>>>>>>>>>>>> +       struct {
>>>>>>>>>>>> +               /**
>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>> +                */
>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>>>>> +
>>>>>>>>>>>> +               /**
>>>>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>>>>> callback
>>>>>>>>>>>> +                */
>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>> +};
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>>>>>> resv
>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +static inline int
>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>>>> num_fences);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>> +                             unsigned int num_fences);
>>>>>>>>>>>> +
>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>>>>> +
>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>> +
>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>>>>>> +                             unsigned int num_objs,
>>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>> +
>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>>>>>> BOs
>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>> previously acquired
>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +static inline void
>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>> private_usage,
>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>> + */
>>>>>>>>>>>> +static inline void
>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>> private_usage,
>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>>>>>>> fence,
>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>     /**
>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>                           */
>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>> +
>>>>>>>>>>>> +                       /**
>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>> +                        */
>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>> +
>>>>>>>>>>>> +                       /**
>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>> +                        */
>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>          } list;
>>>>>>>>>>>>     };
>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>>>>> evict);
>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>> +
>>>>>>>>>>>>     /**
>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>>> iteration step
>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>           * used.
>>>>>>>>>>>>           */
>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>>>>>> *priv);
>>>>>>>>>>>> +
>>>>>>>>>>>> +       /**
>>>>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>>>>> +        *
>>>>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>> +        *
>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>> specific variant of
>>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>>> +        */
>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>>     };
>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>>>>>
>>>>
>>>
>>
>
Danilo Krummrich Sept. 19, 2023, 3:16 p.m. UTC | #47
On 9/19/23 14:21, Thomas Hellström wrote:
> Hi Christian
> 
> On 9/19/23 14:07, Christian König wrote:
>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>> On 9/13/23 17:33, Christian König wrote:
>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>> As mentioned in a different mail thread, the reply is based on the assumption
>>>>>>> that we don't support anything else than GPUVM updates from the IOCTL.
>>>>>>
>>>>>> I think that this assumption is incorrect.
>>>>>
>>>>> Well, more precisely I should have said "don't support GPUVM updated from within
>>>>> fence signaling critical sections". And looking at the code, that doesn't seem what
>>>>> you're doing there.
>>>>>
>>>>>>
>>>>>> Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well.
>>>>>>
>>>>>> Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock.
>>>>>
>>>>> What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict()
>>>>> should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv
>>>>> lock there.
>>>>
>>>> Well the question is which dma-resv lock do we hold?
>>>>
>>>> In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM.
>>>
>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted
>>> and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all
>>> dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted
>>> list on validate(). This way we never touch the evicted list without holding at least the VM's
>>> dma-resv lock.
>>>
>>> Do you have any concerns about that?
>>
>> Scratching my head a bit how that is supposed to work.
>>
>> This implies that you go over all the evicted BOs during validation and not just the one mentioned in the CS.
>>
>> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>
>>>>> The eviction_lock seems to protect a VM state "evicting" of whether any BO that
>>>>> is associated with the VM is currently evicting. At the same time amdgpu protects
>>>>> the eviceted list of the VM with a different lock. So this seems to be entirely
>>>>> unrelated. Tracking a "currently evicting" state is not part of the GPUVM
>>>>> implementation currently and hence nothing would change for amdgpu there.
>>>>
>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>
>>>> The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated.
>>>>
>>>> This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't.
>>>
>>> I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM
>>> objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing
>>> the VA space does not require any dma-resv locks.
>>
>> I hope so, but I'm not 100% sure.
>>
>>>
>>>>
>>>> This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>
>>>>
>>>> The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM.
>>>
>>> That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is
>>> currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence
>>> would be the same driver specific code with the same driver specifc lock.
>>
>> That is most likely a show stopper using this for OpenGL based workloads as far as I can see. For those you need to able to figure out which non-VM BOs have been evicted and which parts of the VM needs updates.
> 
> We identify those with a bool in the gpuvm_bo, and that bool is protected by the bo_resv. In essence, the "evicted" list must be made up-to-date with all relevant locks held before traversing in the next exec.

What I still miss with this idea is how do we find all the drm_gpuvm_bo structures with the evicted bool set to true? When doing the drm_exec dance we come across all external ones and can add them to the list if needed, but what about the BOs having the VM's dma-resv?

> 
> If you mean that we need to unbind all vmas of all vms of evicted bos before evicting, We don't do that, at least not in Xe, since evicting we wait for VM idle, and it cant access anything through the stale vmas until they have been revalidated and rebound.
> 
> /Thomas
> 
> 
> 
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>
>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>> allocations and mappings, generically connect GPU VA mappings
>>>>>>>>>>>>> to their
>>>>>>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>> space.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>> can potentially be generalized in order to make the DRM GPUVA
>>>>>>>>>>>>> manager
>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>>>>> which are
>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM objects dma-
>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rather than being designed as a "framework", the target is to
>>>>>>>>>>>>> make all
>>>>>>>>>>>>> features appear as a collection of optional helper functions,
>>>>>>>>>>>>> such that
>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>>>>>>> any feature
>>>>>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an existing
>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>      * particular combination. If not existent a new instance
>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>> + * use of helper functions such as drm_gpuvm_prepare_range()
>>>>>>>>>>>>> or
>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as external object
>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>      */
>>>>>>>>>>>>>     /**
>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the same
>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function contains
>>>>>>>>>>>>> a particular
>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>>>>>> such as
>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>>>>> called with external
>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>      */
>>>>>>>>>>>>>     /**
>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>      */
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>>>>> first element from
>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>> Are the list spinlocks needed for that async state update from
>>>>>>>>>>>> within the
>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>
>>>>>>>>>>>> Otherwise it should be sufficient to protect the lists with the
>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>
>>>>>>>>>>>> If those spinlocks are still needed in some situations, perhaps
>>>>>>>>>>>> could we
>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>>>>>>> allows for)?
>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>>>>>>> called for. Hence,
>>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls with
>>>>>>>>>>> different BOs.
>>>>>>>>>> No. Only if you try to add external objects to the vm's evict list
>>>>>>>>>> from
>>>>>>>>>> within the evict code. That's not necessary since you loop through
>>>>>>>>>> all
>>>>>>>>>> external objects anyway when locking them so an "evicted" bool in
>>>>>>>>>> the vm_bo,
>>>>>>>>>> protected by the bo resv would be sufficient. The extobj locking
>>>>>>>>>> loop can
>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>> And validate() can remove it while still holding all dma-resv locks,
>>>>>>>>> neat!
>>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>>> concurrently? What
>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not
>>>>>>>>> on the
>>>>>>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy()
>>>>>>>>> with the
>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>> might drop the last reference to the drm_gem_object and hence we'd
>>>>>>>>> potentially
>>>>>>>>> free the dma-resv lock while holding it, at least if it's an external
>>>>>>>>> object.
>>>>>>>> Easiest way in this scheme is to think of the lists as being protected
>>>>>>>> by the vm's resv lock. That means anybody calling unlink() must also
>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but
>>>>>>>> perhaps not from a locking inversion POW from an async list update).
>>>>>>> This would mean that on unlink() we'd need to hold the VM's resv lock and the
>>>>>>> corresponding GEM's resv lock (in case they're not the same anyways) because the
>>>>>>> VM's resv lock would protect the external / evicted object lists and the GEM
>>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>
>>>>>>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>>>>>>> really would not
>>>>>>>>>>> like to add even more complexity just to get the spinlock out of
>>>>>>>>>>> the way in case
>>>>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>>>>>>> pretty
>>>>>>>>>> costly and as discussed earlier this type of locking was the reason
>>>>>>>>>> (at
>>>>>>>>>> least according to the commit message) that made Christian drop the
>>>>>>>>>> XArray
>>>>>>>>>> use in drm_exec for the same set of objects: "The locking overhead
>>>>>>>>>> is
>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>>>> complexity and a
>>>>>>>>>> single wide lock following the drm locking guidelines set out by
>>>>>>>>>> Daniel and
>>>>>>>>>> David should really be the default choice with an opt-in for a
>>>>>>>>>> spinlock if
>>>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>>>> For the external object list an outer lock would work as long as it's
>>>>>>>>> not the
>>>>>>>>> dma-resv lock of the corresponding GEM object, since here we actually
>>>>>>>>> need to
>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>> It's just a bit weird design wise that drivers would need to take
>>>>>>>>> this outer
>>>>>>>>> lock on:
>>>>>>>>>
>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put())
>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>
>>>>>>>>> Given that it seems reasonable to do all the required locking
>>>>>>>>> internally.
>>>>>>>>  From a design POW, there has been a clear direction in XE to make
>>>>>>>> things similar to mmap() / munmap(), so this outer lock, which in Xe is
>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting
>>>>>>>> the page-table structures and vma rb tree, the userptr structures and
>>>>>>>> the extobj list. Basically it's taken early in the exec IOCTL, the
>>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so
>>>>>>>> all of the above are just asserting that it is taken in the correct
>>>>>>>> mode.
>>>>>>>>
>>>>>>>> But strictly with this scheme one could also use the vm's dma_resv for
>>>>>>>> the extobj list since with drm_exec, it's locked before traversing the
>>>>>>>> list.
>>>>>>>>
>>>>>>>> The whole point of this scheme is to rely on locks that you already are
>>>>>>>> supposed to be holding for various reasons and is simple to comprehend.
>>>>>>> I don't agree that we're supposed to hold the VM's resv lock anyways for
>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine using it
>>>>>>> for that purpose nevertheless.
>>>>>>>
>>>>>>>>> In order to at least place lockdep checks, the driver would need to
>>>>>>>>> supply the
>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't
>>>>>>>>> know about
>>>>>>>>> the lock.
>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>> I'd really like to avoid that, especially now that everything got simpler. We
>>>>>>> should define the actual locks to take instead.
>>>>>>>
>>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't
>>>>>>>>> need to
>>>>>>>>> spin?
>>>>>>>> I guess it's hard to tell exactly, but it is much lower on modern x86
>>>>>>>> than what it used to be. Not sure about ARM, which is the other
>>>>>>>> architecture important to us. I figure if there is little cache-line
>>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>>
>>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>>
>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>>>>>>> spinlock_t
>>>>>>>>>> *lock)
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>>
>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>>> For such drivers, that would require anybody calling unlink to
>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>> resv, though.
>>>>>>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs
>>>>>>>>>>> gpuva list (or
>>>>>>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>>>>>>> lock for that
>>>>>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>>>>>>> was referring to
>>>>>>>>>>> earlier.
>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva
>>>>>>>>>> list, but
>>>>>>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>>>>>>> problem. We
>>>>>>>>>> may free the object and a pointer to the vm's resv during unlink
>>>>>>>>>> but we
>>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any
>>>>>>>>>> calls to
>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>> Drivers calling unlink() from the fence signaling path can't use the
>>>>>>>>> VM's
>>>>>>>>> dma-resv lock.
>>>>>>>> Yes, that made me a bit curious because in the current version the code
>>>>>>>> required the object's dma_resv for unlink() which can't be grabbed
>>>>>>>> either from the fence signaling path. So are there any drivers actually
>>>>>>>> wanting to do that? If so, they will either need to resort to the
>>>>>>>> current spinlock solution or they will need to call unlink from a
>>>>>>>> workqueue item.
>>>>>>> As Boris already mentioned we have the dma-resv lock by default or a driver
>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter.
>>>>>>>
>>>>>>>>> Also, what if the object is an external object? We can't use the VM's
>>>>>>>>> dma-resv
>>>>>>>>> lock here.
>>>>>>>> Why? Typically (sync) unlink is only ever called from an unbind-like
>>>>>>>> operation where it should be trivial to grab the vm's resv. Or, for
>>>>>>>> that matter any outer lock protecting the extobj list. Rule would be
>>>>>>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would
>>>>>>>> be protected by either the vm's dma_resv (or possibly an outer lock in
>>>>>>>> the case of the extobj list).
>>>>>>> Outer lock wouldn't have been working for updates in the async path, but
>>>>>>> shouldn't be relevant anymore. We could use the VM's resv for that.
>>>>>>>
>>>>>>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>>>>>> refcount drops
>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might
>>>>>>>>> drop the
>>>>>>>>> last reference of the GEM object.
>>>>>>>> Yes, but this is a different problem as to what exactly protects
>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list
>>>>>>>> lock, or if we want to keep the bo's dma_resv we need to ensure that
>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj
>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know
>>>>>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>>>>>> pointer you dereference unless you're under a lock that ensures keeping
>>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock)
>>>>>>>> I don't have a strong preference.
>>>>>>> We can keep the GEM objects dma-resv lock, however as mentioned above
>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's resv lock
>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>
>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>> GEM gpuva list lock.
>>>>>>>> I don't think these are real problems.
>>>>>>>> With the excepton of the eviction list "trick" where we currently have
>>>>>>>> slightly different approach to collect external bos needing rebinding,
>>>>>>>> we have this working fine.
>>>>>>>>
>>>>>>>> TBH I think pretty much the only situation where the spinlock is needed
>>>>>>>> is for async updates of these lists, unless a wq item can be used for
>>>>>>>> that, but it doesn't really seem like the current code allows for such
>>>>>>>> updates anyway? It complicates the code a lot, adds overhead and also
>>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>>
>>>>>>>> /Thomas
>>>>>>>>
>>>>>>>>>> /Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>>>>>>> atomic.
>>>>>>>>>>>>
>>>>>>>>>>>> All in the spirit of the drm locking guidelines "use big locks
>>>>>>>>>>>> when
>>>>>>>>>>>> possible".
>>>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>
>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>> +
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>>>> + struct
>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>> +
>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>> +
>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>> {                                                        \
>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>> +
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>                             \
>>>>>>>>>>>>> +       })
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>> +                                               __local_list,
>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>> +
>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>        \
>>>>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>> +                                               __local_list,
>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>>>>>>> original list
>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>> + * to restore the original state and let new iterations take
>>>>>>>>>>>>> place.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>> +       do
>>>>>>>>>>>>> {
>>>>>>>>>>>>>                  \
>>>>>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>> +
>>>>>>>>>>>>> */
>>>>>>>>>>>>>            \
>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>>>>>>> list
>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>> +       do
>>>>>>>>>>>>> {
>>>>>>>>>>>>>          \
>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>>>>>>> list
>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>> +       do
>>>>>>>>>>>>> {
>>>>>>>>>>>>>          \
>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>> +
>>>>>>>>>>>>>          drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>> +
>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects the
>>>>>>>>>>>>> given
>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this function
>>>>>>>>>>>>> within the
>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>> +       }
>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped within
>>>>>>>>>>>>> a given range
>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>> +       }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>>>> given
>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>>> callback.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>> +               }
>>>>>>>>>>>>> +       }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +err:
>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static int
>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>>>>>>> lock
>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>> +       }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +err:
>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as evicted
>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>> +       }
>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>>>>>>> extobj
>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +void
>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>> +       }
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of struct
>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>> +
>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>> +
>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref *kref)
>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>>>>>>      *
>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>>>>      */
>>>>>>>>>>>>>     void
>>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>     }
>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its
>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>>>>>> external object,
>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +void
>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>>>>>>> / from a
>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +void
>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>>>>>>> +               else
>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>>>>>>> +       }
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>      */
>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>           */
>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>> +        */
>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>>>> serving as
>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>> +                */
>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>> +                * @lock: spinlock to protect the extobj list
>>>>>>>>>>>>> +                */
>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>> +        * @evict: structure holding the evict list and evict
>>>>>>>>>>>>> list lock
>>>>>>>>>>>>> +        */
>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>>>> currently being
>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>> +                */
>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>>>>>>> +                */
>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>     };
>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>>>>>> from the
>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>     {
>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__)
>>>>>>>>>>>>> \
>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock additional
>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>> +        */
>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>>>>>>> +        */
>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>>> +        */
>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>> +                */
>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>>>>>> callback
>>>>>>>>>>>>> +                */
>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>> +};
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>>>>>>> resv
>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>>> +                             unsigned int num_fences);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>>>>>>> +                             unsigned int num_objs,
>>>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>>>>>>> BOs
>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec,
>>>>>>>>>>>>> fence,
>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>     };
>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>>>>>> evict);
>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>           */
>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>>>>>> +        *
>>>>>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>> +        *
>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>>>> +        */
>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>>>     };
>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
>>>>>>
>>>>>
>>>>
>>>
>>
>
Thomas Hellstrom Sept. 19, 2023, 3:23 p.m. UTC | #48
On 9/19/23 17:16, Danilo Krummrich wrote:
> On 9/19/23 14:21, Thomas Hellström wrote:
>> Hi Christian
>>
>> On 9/19/23 14:07, Christian König wrote:
>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>> On 9/13/23 17:33, Christian König wrote:
>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>> As mentioned in a different mail thread, the reply is based on 
>>>>>>>> the assumption
>>>>>>>> that we don't support anything else than GPUVM updates from the 
>>>>>>>> IOCTL.
>>>>>>>
>>>>>>> I think that this assumption is incorrect.
>>>>>>
>>>>>> Well, more precisely I should have said "don't support GPUVM 
>>>>>> updated from within
>>>>>> fence signaling critical sections". And looking at the code, that 
>>>>>> doesn't seem what
>>>>>> you're doing there.
>>>>>>
>>>>>>>
>>>>>>> Vulkan is just once specific use case, but this here should 
>>>>>>> probably be able to handle other use cases as well.
>>>>>>>
>>>>>>> Especially with HMM you get the requirement that you need to be 
>>>>>>> able to invalidate GPUVM mappings without grabbing a reservation 
>>>>>>> lock.
>>>>>>
>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>> context? drm_gpuvm_bo_evict()
>>>>>> should only be called from a ttm_device_funcs::move callback, we 
>>>>>> should hold the dma-resv
>>>>>> lock there.
>>>>>
>>>>> Well the question is which dma-resv lock do we hold?
>>>>>
>>>>> In the move callback we only hold the dma-resv lock of the BO 
>>>>> which is moved, but when that is a shared BO then that's not the 
>>>>> same as the one for the VM.
>>>>
>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to protect 
>>>> drm_gpuvm_bo::evicted
>>>> and then actually move the drm_gpuvm_bo to the VM's evicted list 
>>>> once we grabbed all
>>>> dma-resv locks when locking the VM's BOs using drm_exec. We can 
>>>> remove them from the evicted
>>>> list on validate(). This way we never touch the evicted list 
>>>> without holding at least the VM's
>>>> dma-resv lock.
>>>>
>>>> Do you have any concerns about that?
>>>
>>> Scratching my head a bit how that is supposed to work.
>>>
>>> This implies that you go over all the evicted BOs during validation 
>>> and not just the one mentioned in the CS.
>>>
>>> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>
>>>>>> The eviction_lock seems to protect a VM state "evicting" of 
>>>>>> whether any BO that
>>>>>> is associated with the VM is currently evicting. At the same time 
>>>>>> amdgpu protects
>>>>>> the eviceted list of the VM with a different lock. So this seems 
>>>>>> to be entirely
>>>>>> unrelated. Tracking a "currently evicting" state is not part of 
>>>>>> the GPUVM
>>>>>> implementation currently and hence nothing would change for 
>>>>>> amdgpu there.
>>>>>
>>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>>
>>>>> The eviction lock and evicted state is for the VM page tables, 
>>>>> e.g. if the whole VM is currently not used and swapped out or even 
>>>>> de-allocated.
>>>>>
>>>>> This is necessary because we have cases where we need to access 
>>>>> the VM data without holding the dma-resv lock of this VM. 
>>>>> Especially figuring out which parts of an address space contain 
>>>>> mappings and which doesn't.
>>>>
>>>> I think this is fine, this has nothing to do with lists of evicted 
>>>> GEM objects or external GEM
>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>> the VA space does not require any dma-resv locks.
>>>
>>> I hope so, but I'm not 100% sure.
>>>
>>>>
>>>>>
>>>>> This is a requirement which comes with HMM handling, you won't see 
>>>>> this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>
>>>>>
>>>>> The invalidation lock on the other hand is what in this discussion 
>>>>> is called eviction lock. This one is needed because what I wrote 
>>>>> above, during the move callback only the dma-resv of the BO which 
>>>>> is moved is locked, but not necessarily the dma-resv of the VM.
>>>>
>>>> That's yet another thing, right? This is used to track whether 
>>>> *any* BO that belongs to the VM is
>>>> currently being evicted, correct? As mentioned, as by now this is 
>>>> not supported in GPUVM and hence
>>>> would be the same driver specific code with the same driver specifc 
>>>> lock.
>>>
>>> That is most likely a show stopper using this for OpenGL based 
>>> workloads as far as I can see. For those you need to able to figure 
>>> out which non-VM BOs have been evicted and which parts of the VM 
>>> needs updates.
>>
>> We identify those with a bool in the gpuvm_bo, and that bool is 
>> protected by the bo_resv. In essence, the "evicted" list must be made 
>> up-to-date with all relevant locks held before traversing in the next 
>> exec.
>
> What I still miss with this idea is how do we find all the 
> drm_gpuvm_bo structures with the evicted bool set to true? When doing 
> the drm_exec dance we come across all external ones and can add them 
> to the list if needed, but what about the BOs having the VM's dma-resv?

Oh, they can be added to the evict list directly (no bool needed) in the 
eviction code, like in v3. Since for those we indeed hold the VM's 
dma_resv since it's aliased with the object's dma-resv.

/Thomas



>
>>
>> If you mean that we need to unbind all vmas of all vms of evicted bos 
>> before evicting, We don't do that, at least not in Xe, since evicting 
>> we wait for VM idle, and it cant access anything through the stale 
>> vmas until they have been revalidated and rebound.
>>
>> /Thomas
>>
>>
>>
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström 
>>>>>>>>>> wrote:
>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common infrastructure to
>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>> allocations and mappings, generically connect GPU VA 
>>>>>>>>>>>>>> mappings
>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>> can potentially be generalized in order to make the DRM 
>>>>>>>>>>>>>> GPUVA
>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM objects 
>>>>>>>>>>>>>> dma-
>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Rather than being designed as a "framework", the target 
>>>>>>>>>>>>>> is to
>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>> features appear as a collection of optional helper 
>>>>>>>>>>>>>> functions,
>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA managers basic
>>>>>>>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an 
>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>      * particular combination. If not existent a new 
>>>>>>>>>>>>>> instance
>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. For
>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be
>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the &drm_exec
>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as external 
>>>>>>>>>>>>>> object
>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the 
>>>>>>>>>>>>>> same
>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external and
>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * However, drivers still need ensure to protect concurrent
>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function 
>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>> Are the list spinlocks needed for that async state update 
>>>>>>>>>>>>> from
>>>>>>>>>>>>> within the
>>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>
>>>>>>>>>>>>> Otherwise it should be sufficient to protect the lists 
>>>>>>>>>>>>> with the
>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If those spinlocks are still needed in some situations, 
>>>>>>>>>>>>> perhaps
>>>>>>>>>>>>> could we
>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() calls 
>>>>>>>>>>>> with
>>>>>>>>>>>> different BOs.
>>>>>>>>>>> No. Only if you try to add external objects to the vm's 
>>>>>>>>>>> evict list
>>>>>>>>>>> from
>>>>>>>>>>> within the evict code. That's not necessary since you loop 
>>>>>>>>>>> through
>>>>>>>>>>> all
>>>>>>>>>>> external objects anyway when locking them so an "evicted" 
>>>>>>>>>>> bool in
>>>>>>>>>>> the vm_bo,
>>>>>>>>>>> protected by the bo resv would be sufficient. The extobj 
>>>>>>>>>>> locking
>>>>>>>>>>> loop can
>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>> And validate() can remove it while still holding all dma-resv 
>>>>>>>>>> locks,
>>>>>>>>>> neat!
>>>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>>>> concurrently? What
>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo 
>>>>>>>>>> is not
>>>>>>>>>> on the
>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>> with the
>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>> might drop the last reference to the drm_gem_object and hence 
>>>>>>>>>> we'd
>>>>>>>>>> potentially
>>>>>>>>>> free the dma-resv lock while holding it, at least if it's an 
>>>>>>>>>> external
>>>>>>>>>> object.
>>>>>>>>> Easiest way in this scheme is to think of the lists as being 
>>>>>>>>> protected
>>>>>>>>> by the vm's resv lock. That means anybody calling unlink() 
>>>>>>>>> must also
>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of 
>>>>>>>>> view, but
>>>>>>>>> perhaps not from a locking inversion POW from an async list 
>>>>>>>>> update).
>>>>>>>> This would mean that on unlink() we'd need to hold the VM's 
>>>>>>>> resv lock and the
>>>>>>>> corresponding GEM's resv lock (in case they're not the same 
>>>>>>>> anyways) because the
>>>>>>>> VM's resv lock would protect the external / evicted object 
>>>>>>>> lists and the GEM
>>>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the
>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>
>>>>>>>>>>>> For extobjs an outer lock would be enough in case of Xe, but I
>>>>>>>>>>>> really would not
>>>>>>>>>>>> like to add even more complexity just to get the spinlock 
>>>>>>>>>>>> out of
>>>>>>>>>>>> the way in case
>>>>>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>>>>>> I must disagree here. These spinlocks and atomic operations are
>>>>>>>>>>> pretty
>>>>>>>>>>> costly and as discussed earlier this type of locking was the 
>>>>>>>>>>> reason
>>>>>>>>>>> (at
>>>>>>>>>>> least according to the commit message) that made Christian 
>>>>>>>>>>> drop the
>>>>>>>>>>> XArray
>>>>>>>>>>> use in drm_exec for the same set of objects: "The locking 
>>>>>>>>>>> overhead
>>>>>>>>>>> is
>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>>>>> complexity and a
>>>>>>>>>>> single wide lock following the drm locking guidelines set 
>>>>>>>>>>> out by
>>>>>>>>>>> Daniel and
>>>>>>>>>>> David should really be the default choice with an opt-in for a
>>>>>>>>>>> spinlock if
>>>>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>>>>> For the external object list an outer lock would work as long 
>>>>>>>>>> as it's
>>>>>>>>>> not the
>>>>>>>>>> dma-resv lock of the corresponding GEM object, since here we 
>>>>>>>>>> actually
>>>>>>>>>> need to
>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>> It's just a bit weird design wise that drivers would need to 
>>>>>>>>>> take
>>>>>>>>>> this outer
>>>>>>>>>> lock on:
>>>>>>>>>>
>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>
>>>>>>>>>> Given that it seems reasonable to do all the required locking
>>>>>>>>>> internally.
>>>>>>>>>  From a design POW, there has been a clear direction in XE to 
>>>>>>>>> make
>>>>>>>>> things similar to mmap() / munmap(), so this outer lock, which 
>>>>>>>>> in Xe is
>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's 
>>>>>>>>> protecting
>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>> structures and
>>>>>>>>> the extobj list. Basically it's taken early in the exec IOCTL, 
>>>>>>>>> the
>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault 
>>>>>>>>> handler, so
>>>>>>>>> all of the above are just asserting that it is taken in the 
>>>>>>>>> correct
>>>>>>>>> mode.
>>>>>>>>>
>>>>>>>>> But strictly with this scheme one could also use the vm's 
>>>>>>>>> dma_resv for
>>>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>>>> traversing the
>>>>>>>>> list.
>>>>>>>>>
>>>>>>>>> The whole point of this scheme is to rely on locks that you 
>>>>>>>>> already are
>>>>>>>>> supposed to be holding for various reasons and is simple to 
>>>>>>>>> comprehend.
>>>>>>>> I don't agree that we're supposed to hold the VM's resv lock 
>>>>>>>> anyways for
>>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but 
>>>>>>>> I'm fine using it
>>>>>>>> for that purpose nevertheless.
>>>>>>>>
>>>>>>>>>> In order to at least place lockdep checks, the driver would 
>>>>>>>>>> need to
>>>>>>>>>> supply the
>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM otherwise 
>>>>>>>>>> doesn't
>>>>>>>>>> know about
>>>>>>>>>> the lock.
>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>> I'd really like to avoid that, especially now that everything 
>>>>>>>> got simpler. We
>>>>>>>> should define the actual locks to take instead.
>>>>>>>>
>>>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() that 
>>>>>>>>>> doesn't
>>>>>>>>>> need to
>>>>>>>>>> spin?
>>>>>>>>> I guess it's hard to tell exactly, but it is much lower on 
>>>>>>>>> modern x86
>>>>>>>>> than what it used to be. Not sure about ARM, which is the other
>>>>>>>>> architecture important to us. I figure if there is little 
>>>>>>>>> cache-line
>>>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>>>
>>>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>>>
>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm,
>>>>>>>>>>> spinlock_t
>>>>>>>>>>> *lock)
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>>
>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>>> For such drivers, that would require anybody calling 
>>>>>>>>>>>>> unlink to
>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock for the 
>>>>>>>>>>>> GEMs
>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>> VM_BO list to be more precise). We can't just use the dma-resv
>>>>>>>>>>>> lock for that
>>>>>>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's the fix I
>>>>>>>>>>>> was referring to
>>>>>>>>>>>> earlier.
>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's 
>>>>>>>>>>> gpuva
>>>>>>>>>>> list, but
>>>>>>>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a
>>>>>>>>>>> problem. We
>>>>>>>>>>> may free the object and a pointer to the vm's resv during 
>>>>>>>>>>> unlink
>>>>>>>>>>> but we
>>>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring that 
>>>>>>>>>>> any
>>>>>>>>>>> calls to
>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>> Drivers calling unlink() from the fence signaling path can't 
>>>>>>>>>> use the
>>>>>>>>>> VM's
>>>>>>>>>> dma-resv lock.
>>>>>>>>> Yes, that made me a bit curious because in the current version 
>>>>>>>>> the code
>>>>>>>>> required the object's dma_resv for unlink() which can't be 
>>>>>>>>> grabbed
>>>>>>>>> either from the fence signaling path. So are there any drivers 
>>>>>>>>> actually
>>>>>>>>> wanting to do that? If so, they will either need to resort to the
>>>>>>>>> current spinlock solution or they will need to call unlink from a
>>>>>>>>> workqueue item.
>>>>>>>> As Boris already mentioned we have the dma-resv lock by default 
>>>>>>>> or a driver
>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the 
>>>>>>>> latter.
>>>>>>>>
>>>>>>>>>> Also, what if the object is an external object? We can't use 
>>>>>>>>>> the VM's
>>>>>>>>>> dma-resv
>>>>>>>>>> lock here.
>>>>>>>>> Why? Typically (sync) unlink is only ever called from an 
>>>>>>>>> unbind-like
>>>>>>>>> operation where it should be trivial to grab the vm's resv. 
>>>>>>>>> Or, for
>>>>>>>>> that matter any outer lock protecting the extobj list. Rule 
>>>>>>>>> would be
>>>>>>>>> the drm_gpuvm_bo::entry::extobj  and 
>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>> be protected by either the vm's dma_resv (or possibly an outer 
>>>>>>>>> lock in
>>>>>>>>> the case of the extobj list).
>>>>>>>> Outer lock wouldn't have been working for updates in the async 
>>>>>>>> path, but
>>>>>>>> shouldn't be relevant anymore. We could use the VM's resv for 
>>>>>>>> that.
>>>>>>>>
>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held when calling
>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>>>>>>> refcount drops
>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>> drop the
>>>>>>>>>> last reference of the GEM object.
>>>>>>>>> Yes, but this is a different problem as to what exactly protects
>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal 
>>>>>>>>> per bo list
>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need to 
>>>>>>>>> ensure that
>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts 
>>>>>>>>> its obj
>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>> refcount (I know
>>>>>>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>>>>>>> pointer you dereference unless you're under a lock that 
>>>>>>>>> ensures keeping
>>>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal 
>>>>>>>>> spinlock)
>>>>>>>>> I don't have a strong preference.
>>>>>>>> We can keep the GEM objects dma-resv lock, however as mentioned 
>>>>>>>> above
>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both 
>>>>>>>> the VM's resv lock
>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>
>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>> I don't think these are real problems.
>>>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>>>> currently have
>>>>>>>>> slightly different approach to collect external bos needing 
>>>>>>>>> rebinding,
>>>>>>>>> we have this working fine.
>>>>>>>>>
>>>>>>>>> TBH I think pretty much the only situation where the spinlock 
>>>>>>>>> is needed
>>>>>>>>> is for async updates of these lists, unless a wq item can be 
>>>>>>>>> used for
>>>>>>>>> that, but it doesn't really seem like the current code allows 
>>>>>>>>> for such
>>>>>>>>> updates anyway? It complicates the code a lot, adds overhead 
>>>>>>>>> and also
>>>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>>>
>>>>>>>>> /Thomas
>>>>>>>>>
>>>>>>>>>>> /Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> All in the spirit of the drm locking guidelines "use big 
>>>>>>>>>>>>> locks
>>>>>>>>>>>>> when
>>>>>>>>>>>>> possible".
>>>>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>
>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>> {                                                        \
>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>> + * iterator releases the lock immediately after picking the
>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to their
>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to store
>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>> + * to restore the original state and let new iterations 
>>>>>>>>>>>>>> take
>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>>>>>>>> list
>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>>>>>>>> list
>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), 
>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all assoiciated 
>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> given
>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Note: This function is safe against concurrent insertion
>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this 
>>>>>>>>>>>>>> function
>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, vm_bo->obj,
>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped 
>>>>>>>>>>>>>> within
>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>>>>> given
>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec instance.
>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>> + * contains mappings of, plus the ones given through @objs.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as 
>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, 
>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of 
>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref 
>>>>>>>>>>>>>> *kref)
>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref 
>>>>>>>>>>>>>> *kref)
>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the reference of
>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. Hence, if
>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>> + * function can potentially let the reference count to zero
>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>     EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to 
>>>>>>>>>>>>>> its
>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the
>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj list if
>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to
>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, evict);
>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, evict);
>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>> +                * @lock: spinlock to protect the extobj 
>>>>>>>>>>>>>> list
>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>> +        * @evict: structure holding the evict list and 
>>>>>>>>>>>>>> evict
>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>> +                * @lock: spinlock to protect the evict list
>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, 
>>>>>>>>>>>>>> gpuvm__)
>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock 
>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>>>> +                             unsigned int num_fences);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>>>>>>>> +                             unsigned int num_objs,
>>>>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void 
>>>>>>>>>>>>>> *priv,
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Christian König Sept. 20, 2023, 5:37 a.m. UTC | #49
Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>
> On 9/19/23 17:16, Danilo Krummrich wrote:
>> On 9/19/23 14:21, Thomas Hellström wrote:
>>> Hi Christian
>>>
>>> On 9/19/23 14:07, Christian König wrote:
>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>> As mentioned in a different mail thread, the reply is based on 
>>>>>>>>> the assumption
>>>>>>>>> that we don't support anything else than GPUVM updates from 
>>>>>>>>> the IOCTL.
>>>>>>>>
>>>>>>>> I think that this assumption is incorrect.
>>>>>>>
>>>>>>> Well, more precisely I should have said "don't support GPUVM 
>>>>>>> updated from within
>>>>>>> fence signaling critical sections". And looking at the code, 
>>>>>>> that doesn't seem what
>>>>>>> you're doing there.
>>>>>>>
>>>>>>>>
>>>>>>>> Vulkan is just once specific use case, but this here should 
>>>>>>>> probably be able to handle other use cases as well.
>>>>>>>>
>>>>>>>> Especially with HMM you get the requirement that you need to be 
>>>>>>>> able to invalidate GPUVM mappings without grabbing a 
>>>>>>>> reservation lock.
>>>>>>>
>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>> should only be called from a ttm_device_funcs::move callback, 
>>>>>>> we should hold the dma-resv
>>>>>>> lock there.
>>>>>>
>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>
>>>>>> In the move callback we only hold the dma-resv lock of the BO 
>>>>>> which is moved, but when that is a shared BO then that's not the 
>>>>>> same as the one for the VM.
>>>>>
>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>> protect drm_gpuvm_bo::evicted
>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted list 
>>>>> once we grabbed all
>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We can 
>>>>> remove them from the evicted
>>>>> list on validate(). This way we never touch the evicted list 
>>>>> without holding at least the VM's
>>>>> dma-resv lock.
>>>>>
>>>>> Do you have any concerns about that?
>>>>
>>>> Scratching my head a bit how that is supposed to work.
>>>>
>>>> This implies that you go over all the evicted BOs during validation 
>>>> and not just the one mentioned in the CS.
>>>>
>>>> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>
>>>>>>> The eviction_lock seems to protect a VM state "evicting" of 
>>>>>>> whether any BO that
>>>>>>> is associated with the VM is currently evicting. At the same 
>>>>>>> time amdgpu protects
>>>>>>> the eviceted list of the VM with a different lock. So this seems 
>>>>>>> to be entirely
>>>>>>> unrelated. Tracking a "currently evicting" state is not part of 
>>>>>>> the GPUVM
>>>>>>> implementation currently and hence nothing would change for 
>>>>>>> amdgpu there.
>>>>>>
>>>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>>>
>>>>>> The eviction lock and evicted state is for the VM page tables, 
>>>>>> e.g. if the whole VM is currently not used and swapped out or 
>>>>>> even de-allocated.
>>>>>>
>>>>>> This is necessary because we have cases where we need to access 
>>>>>> the VM data without holding the dma-resv lock of this VM. 
>>>>>> Especially figuring out which parts of an address space contain 
>>>>>> mappings and which doesn't.
>>>>>
>>>>> I think this is fine, this has nothing to do with lists of evicted 
>>>>> GEM objects or external GEM
>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>> the VA space does not require any dma-resv locks.
>>>>
>>>> I hope so, but I'm not 100% sure.
>>>>
>>>>>
>>>>>>
>>>>>> This is a requirement which comes with HMM handling, you won't 
>>>>>> see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>
>>>>>>
>>>>>> The invalidation lock on the other hand is what in this 
>>>>>> discussion is called eviction lock. This one is needed because 
>>>>>> what I wrote above, during the move callback only the dma-resv of 
>>>>>> the BO which is moved is locked, but not necessarily the dma-resv 
>>>>>> of the VM.
>>>>>
>>>>> That's yet another thing, right? This is used to track whether 
>>>>> *any* BO that belongs to the VM is
>>>>> currently being evicted, correct? As mentioned, as by now this is 
>>>>> not supported in GPUVM and hence
>>>>> would be the same driver specific code with the same driver 
>>>>> specifc lock.
>>>>
>>>> That is most likely a show stopper using this for OpenGL based 
>>>> workloads as far as I can see. For those you need to able to figure 
>>>> out which non-VM BOs have been evicted and which parts of the VM 
>>>> needs updates.
>>>
>>> We identify those with a bool in the gpuvm_bo, and that bool is 
>>> protected by the bo_resv. In essence, the "evicted" list must be 
>>> made up-to-date with all relevant locks held before traversing in 
>>> the next exec.
>>
>> What I still miss with this idea is how do we find all the 
>> drm_gpuvm_bo structures with the evicted bool set to true? When doing 
>> the drm_exec dance we come across all external ones and can add them 
>> to the list if needed, but what about the BOs having the VM's dma-resv?
>
> Oh, they can be added to the evict list directly (no bool needed) in 
> the eviction code, like in v3. Since for those we indeed hold the VM's 
> dma_resv since it's aliased with the object's dma-resv.

Yeah, I wanted to note what Danilo seems to think about as well. How do 
we figure out the non-VM BOs evicted?

We can't walk over the list of all non-VM BOs on every submission, 
that's to much overhead for cases with lots of non-VM BOs.

And we can't rely on userspace sending all non-VM BOs as used list down 
to the kernel with each submission.

Regards,
Christian.

>
> /Thomas
>
>
>
>>
>>>
>>> If you mean that we need to unbind all vmas of all vms of evicted 
>>> bos before evicting, We don't do that, at least not in Xe, since 
>>> evicting we wait for VM idle, and it cant access anything through 
>>> the stale vmas until they have been revalidated and rebound.
>>>
>>> /Thomas
>>>
>>>
>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote:
>>>>>>>>>> Hi!
>>>>>>>>>>
>>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common 
>>>>>>>>>>>>>>> infrastructure to
>>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>>> allocations and mappings, generically connect GPU VA 
>>>>>>>>>>>>>>> mappings
>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>> backing buffers and perform more complex mapping operations
>>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>>> can potentially be generalized in order to make the DRM 
>>>>>>>>>>>>>>> GPUVA
>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being used
>>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM objects 
>>>>>>>>>>>>>>> dma-
>>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Rather than being designed as a "framework", the target 
>>>>>>>>>>>>>>> is to
>>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>>> features appear as a collection of optional helper 
>>>>>>>>>>>>>>> functions,
>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA managers 
>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>> functionality and opt-in for other features without setting
>>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>>> updating the GPU VA space within the fence signalling path.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an 
>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>>      * particular combination. If not existent a new 
>>>>>>>>>>>>>>> instance
>>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>>> + * validation of evicted objects bound in a &drm_gpuvm. 
>>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm 
>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the 
>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as external 
>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for the 
>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances unique.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of external 
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * However, drivers still need ensure to protect 
>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() and
>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function 
>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() 
>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo element
>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to 
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>> + * iterator releases the lock immediately after picking 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>> Are the list spinlocks needed for that async state update 
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Otherwise it should be sufficient to protect the lists 
>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If those spinlocks are still needed in some situations, 
>>>>>>>>>>>>>> perhaps
>>>>>>>>>>>>>> could we
>>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the maple tree
>>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>>> holding only the dma-resv lock from the BO this function gets
>>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() 
>>>>>>>>>>>>> calls with
>>>>>>>>>>>>> different BOs.
>>>>>>>>>>>> No. Only if you try to add external objects to the vm's 
>>>>>>>>>>>> evict list
>>>>>>>>>>>> from
>>>>>>>>>>>> within the evict code. That's not necessary since you loop 
>>>>>>>>>>>> through
>>>>>>>>>>>> all
>>>>>>>>>>>> external objects anyway when locking them so an "evicted" 
>>>>>>>>>>>> bool in
>>>>>>>>>>>> the vm_bo,
>>>>>>>>>>>> protected by the bo resv would be sufficient. The extobj 
>>>>>>>>>>>> locking
>>>>>>>>>>>> loop can
>>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>>> And validate() can remove it while still holding all 
>>>>>>>>>>> dma-resv locks,
>>>>>>>>>>> neat!
>>>>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>>>>> concurrently? What
>>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>>> Are we guaranteed that at this point of time the 
>>>>>>>>>>> drm_gpuvm_bo is not
>>>>>>>>>>> on the
>>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>> with the
>>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>> might drop the last reference to the drm_gem_object and 
>>>>>>>>>>> hence we'd
>>>>>>>>>>> potentially
>>>>>>>>>>> free the dma-resv lock while holding it, at least if it's an 
>>>>>>>>>>> external
>>>>>>>>>>> object.
>>>>>>>>>> Easiest way in this scheme is to think of the lists as being 
>>>>>>>>>> protected
>>>>>>>>>> by the vm's resv lock. That means anybody calling unlink() 
>>>>>>>>>> must also
>>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of 
>>>>>>>>>> view, but
>>>>>>>>>> perhaps not from a locking inversion POW from an async list 
>>>>>>>>>> update).
>>>>>>>>> This would mean that on unlink() we'd need to hold the VM's 
>>>>>>>>> resv lock and the
>>>>>>>>> corresponding GEM's resv lock (in case they're not the same 
>>>>>>>>> anyways) because the
>>>>>>>>> VM's resv lock would protect the external / evicted object 
>>>>>>>>> lists and the GEM
>>>>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and 
>>>>>>>>> the
>>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>>
>>>>>>>>>>>>> For extobjs an outer lock would be enough in case of Xe, 
>>>>>>>>>>>>> but I
>>>>>>>>>>>>> really would not
>>>>>>>>>>>>> like to add even more complexity just to get the spinlock 
>>>>>>>>>>>>> out of
>>>>>>>>>>>>> the way in case
>>>>>>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>>>>>>> I must disagree here. These spinlocks and atomic operations 
>>>>>>>>>>>> are
>>>>>>>>>>>> pretty
>>>>>>>>>>>> costly and as discussed earlier this type of locking was 
>>>>>>>>>>>> the reason
>>>>>>>>>>>> (at
>>>>>>>>>>>> least according to the commit message) that made Christian 
>>>>>>>>>>>> drop the
>>>>>>>>>>>> XArray
>>>>>>>>>>>> use in drm_exec for the same set of objects: "The locking 
>>>>>>>>>>>> overhead
>>>>>>>>>>>> is
>>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>>>>>> complexity and a
>>>>>>>>>>>> single wide lock following the drm locking guidelines set 
>>>>>>>>>>>> out by
>>>>>>>>>>>> Daniel and
>>>>>>>>>>>> David should really be the default choice with an opt-in for a
>>>>>>>>>>>> spinlock if
>>>>>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>>>>>> For the external object list an outer lock would work as 
>>>>>>>>>>> long as it's
>>>>>>>>>>> not the
>>>>>>>>>>> dma-resv lock of the corresponding GEM object, since here we 
>>>>>>>>>>> actually
>>>>>>>>>>> need to
>>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>>> It's just a bit weird design wise that drivers would need to 
>>>>>>>>>>> take
>>>>>>>>>>> this outer
>>>>>>>>>>> lock on:
>>>>>>>>>>>
>>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>>
>>>>>>>>>>> Given that it seems reasonable to do all the required locking
>>>>>>>>>>> internally.
>>>>>>>>>>  From a design POW, there has been a clear direction in XE to 
>>>>>>>>>> make
>>>>>>>>>> things similar to mmap() / munmap(), so this outer lock, 
>>>>>>>>>> which in Xe is
>>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's 
>>>>>>>>>> protecting
>>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>>> structures and
>>>>>>>>>> the extobj list. Basically it's taken early in the exec 
>>>>>>>>>> IOCTL, the
>>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault 
>>>>>>>>>> handler, so
>>>>>>>>>> all of the above are just asserting that it is taken in the 
>>>>>>>>>> correct
>>>>>>>>>> mode.
>>>>>>>>>>
>>>>>>>>>> But strictly with this scheme one could also use the vm's 
>>>>>>>>>> dma_resv for
>>>>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>>>>> traversing the
>>>>>>>>>> list.
>>>>>>>>>>
>>>>>>>>>> The whole point of this scheme is to rely on locks that you 
>>>>>>>>>> already are
>>>>>>>>>> supposed to be holding for various reasons and is simple to 
>>>>>>>>>> comprehend.
>>>>>>>>> I don't agree that we're supposed to hold the VM's resv lock 
>>>>>>>>> anyways for
>>>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but 
>>>>>>>>> I'm fine using it
>>>>>>>>> for that purpose nevertheless.
>>>>>>>>>
>>>>>>>>>>> In order to at least place lockdep checks, the driver would 
>>>>>>>>>>> need to
>>>>>>>>>>> supply the
>>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM 
>>>>>>>>>>> otherwise doesn't
>>>>>>>>>>> know about
>>>>>>>>>>> the lock.
>>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>>> I'd really like to avoid that, especially now that everything 
>>>>>>>>> got simpler. We
>>>>>>>>> should define the actual locks to take instead.
>>>>>>>>>
>>>>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() that 
>>>>>>>>>>> doesn't
>>>>>>>>>>> need to
>>>>>>>>>>> spin?
>>>>>>>>>> I guess it's hard to tell exactly, but it is much lower on 
>>>>>>>>>> modern x86
>>>>>>>>>> than what it used to be. Not sure about ARM, which is the other
>>>>>>>>>> architecture important to us. I figure if there is little 
>>>>>>>>>> cache-line
>>>>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>>>>
>>>>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>>>>
>>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm 
>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>> spinlock_t
>>>>>>>>>>>> *lock)
>>>>>>>>>>>>
>>>>>>>>>>>> {
>>>>>>>>>>>>
>>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>>> For such drivers, that would require anybody calling 
>>>>>>>>>>>>>> unlink to
>>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock for the 
>>>>>>>>>>>>> GEMs
>>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>>> VM_BO list to be more precise). We can't just use the 
>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>> lock for that
>>>>>>>>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we
>>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's the 
>>>>>>>>>>>>> fix I
>>>>>>>>>>>>> was referring to
>>>>>>>>>>>>> earlier.
>>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the GEM's 
>>>>>>>>>>>> gpuva
>>>>>>>>>>>> list, but
>>>>>>>>>>>> holding the vm's dma-resv lock across the unlink shouldn't 
>>>>>>>>>>>> be a
>>>>>>>>>>>> problem. We
>>>>>>>>>>>> may free the object and a pointer to the vm's resv during 
>>>>>>>>>>>> unlink
>>>>>>>>>>>> but we
>>>>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring 
>>>>>>>>>>>> that any
>>>>>>>>>>>> calls to
>>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>>> Drivers calling unlink() from the fence signaling path can't 
>>>>>>>>>>> use the
>>>>>>>>>>> VM's
>>>>>>>>>>> dma-resv lock.
>>>>>>>>>> Yes, that made me a bit curious because in the current 
>>>>>>>>>> version the code
>>>>>>>>>> required the object's dma_resv for unlink() which can't be 
>>>>>>>>>> grabbed
>>>>>>>>>> either from the fence signaling path. So are there any 
>>>>>>>>>> drivers actually
>>>>>>>>>> wanting to do that? If so, they will either need to resort to 
>>>>>>>>>> the
>>>>>>>>>> current spinlock solution or they will need to call unlink 
>>>>>>>>>> from a
>>>>>>>>>> workqueue item.
>>>>>>>>> As Boris already mentioned we have the dma-resv lock by 
>>>>>>>>> default or a driver
>>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the 
>>>>>>>>> latter.
>>>>>>>>>
>>>>>>>>>>> Also, what if the object is an external object? We can't use 
>>>>>>>>>>> the VM's
>>>>>>>>>>> dma-resv
>>>>>>>>>>> lock here.
>>>>>>>>>> Why? Typically (sync) unlink is only ever called from an 
>>>>>>>>>> unbind-like
>>>>>>>>>> operation where it should be trivial to grab the vm's resv. 
>>>>>>>>>> Or, for
>>>>>>>>>> that matter any outer lock protecting the extobj list. Rule 
>>>>>>>>>> would be
>>>>>>>>>> the drm_gpuvm_bo::entry::extobj  and 
>>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>>> be protected by either the vm's dma_resv (or possibly an 
>>>>>>>>>> outer lock in
>>>>>>>>>> the case of the extobj list).
>>>>>>>>> Outer lock wouldn't have been working for updates in the async 
>>>>>>>>> path, but
>>>>>>>>> shouldn't be relevant anymore. We could use the VM's resv for 
>>>>>>>>> that.
>>>>>>>>>
>>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held when 
>>>>>>>>>>> calling
>>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the
>>>>>>>>>>> refcount drops
>>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>>> drop the
>>>>>>>>>>> last reference of the GEM object.
>>>>>>>>>> Yes, but this is a different problem as to what exactly protects
>>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal 
>>>>>>>>>> per bo list
>>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need to 
>>>>>>>>>> ensure that
>>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts 
>>>>>>>>>> its obj
>>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>>> refcount (I know
>>>>>>>>>> Boris didn't like that, but requiring an explicit refcount for a
>>>>>>>>>> pointer you dereference unless you're under a lock that 
>>>>>>>>>> ensures keeping
>>>>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal 
>>>>>>>>>> spinlock)
>>>>>>>>>> I don't have a strong preference.
>>>>>>>>> We can keep the GEM objects dma-resv lock, however as 
>>>>>>>>> mentioned above
>>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both 
>>>>>>>>> the VM's resv lock
>>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>>
>>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>>> I don't think these are real problems.
>>>>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>>>>> currently have
>>>>>>>>>> slightly different approach to collect external bos needing 
>>>>>>>>>> rebinding,
>>>>>>>>>> we have this working fine.
>>>>>>>>>>
>>>>>>>>>> TBH I think pretty much the only situation where the spinlock 
>>>>>>>>>> is needed
>>>>>>>>>> is for async updates of these lists, unless a wq item can be 
>>>>>>>>>> used for
>>>>>>>>>> that, but it doesn't really seem like the current code allows 
>>>>>>>>>> for such
>>>>>>>>>> updates anyway? It complicates the code a lot, adds overhead 
>>>>>>>>>> and also
>>>>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>>>>
>>>>>>>>>> /Thomas
>>>>>>>>>>
>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> It seems that with that also the refcount could be make non-
>>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All in the spirit of the drm locking guidelines "use big 
>>>>>>>>>>>>>> locks
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> possible".
>>>>>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>>> {                                                        \
>>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list iterator
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * This helper is here to provide lockless list iteration.
>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>> + * iterator releases the lock immediately after picking 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Only used for internal list iterations, not meant to be
>>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to 
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to 
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>>> + * to restore the original state and let new iterations 
>>>>>>>>>>>>>>> take
>>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the given
>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the given
>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), 
>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm 
>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>>>>               "GPUVA tree is not empty, potentially leaking
>>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj list
>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all 
>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Note: This function is safe against concurrent 
>>>>>>>>>>>>>>> insertion
>>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this 
>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the GPUVM's
>>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>> vm_bo->obj,
>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped 
>>>>>>>>>>>>>>> within
>>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
>>>>>>>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>>> + * being set the driver receives the given @fn callback to
>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec 
>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, 
>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int
>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>>> + * @num_objs: the number of additional &drm_gem_objects to
>>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects the
>>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>>> + * contains mappings of, plus the ones given through 
>>>>>>>>>>>>>>> @objs.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as 
>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, 
>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and all
>>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>>>>>>> +                        enum dma_resv_usage private_usage,
>>>>>>>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, obj) {
>>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of 
>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref 
>>>>>>>>>>>>>>> *kref)
>>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref 
>>>>>>>>>>>>>>> *kref)
>>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the 
>>>>>>>>>>>>>>> reference of
>>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. 
>>>>>>>>>>>>>>> Hence, if
>>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>>> + * function can potentially let the reference count to 
>>>>>>>>>>>>>>> zero
>>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo 
>>>>>>>>>>>>>>> to its
>>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj 
>>>>>>>>>>>>>>> list if
>>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a 
>>>>>>>>>>>>>>> &drm_gem_object to
>>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, 
>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the extobj 
>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>> +        * @evict: structure holding the evict list and 
>>>>>>>>>>>>>>> evict
>>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>> +                * @list: &list_head storing &drm_gpuvm_bos
>>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the evict 
>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva 
>>>>>>>>>>>>>>> *va)
>>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, 
>>>>>>>>>>>>>>> gpuvm__)
>>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, &(gpuvm__)-
>>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock 
>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA reservations
>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-
>>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>>>>> +                             unsigned int num_fences);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>> +                             struct drm_gem_object **objs,
>>>>>>>>>>>>>>> +                             unsigned int num_objs,
>>>>>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>>>>>>>> +                             unsigned int num_fences,
>>>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all assoiciated
>>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk over a
>>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, void
>>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void 
>>>>>>>>>>>>>>> *priv,
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
Thomas Hellstrom Sept. 20, 2023, 7:44 a.m. UTC | #50
Hi,

On 9/20/23 07:37, Christian König wrote:
> Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>>
>> On 9/19/23 17:16, Danilo Krummrich wrote:
>>> On 9/19/23 14:21, Thomas Hellström wrote:
>>>> Hi Christian
>>>>
>>>> On 9/19/23 14:07, Christian König wrote:
>>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>>> As mentioned in a different mail thread, the reply is based 
>>>>>>>>>> on the assumption
>>>>>>>>>> that we don't support anything else than GPUVM updates from 
>>>>>>>>>> the IOCTL.
>>>>>>>>>
>>>>>>>>> I think that this assumption is incorrect.
>>>>>>>>
>>>>>>>> Well, more precisely I should have said "don't support GPUVM 
>>>>>>>> updated from within
>>>>>>>> fence signaling critical sections". And looking at the code, 
>>>>>>>> that doesn't seem what
>>>>>>>> you're doing there.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Vulkan is just once specific use case, but this here should 
>>>>>>>>> probably be able to handle other use cases as well.
>>>>>>>>>
>>>>>>>>> Especially with HMM you get the requirement that you need to 
>>>>>>>>> be able to invalidate GPUVM mappings without grabbing a 
>>>>>>>>> reservation lock.
>>>>>>>>
>>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>>> should only be called from a ttm_device_funcs::move callback, 
>>>>>>>> we should hold the dma-resv
>>>>>>>> lock there.
>>>>>>>
>>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>>
>>>>>>> In the move callback we only hold the dma-resv lock of the BO 
>>>>>>> which is moved, but when that is a shared BO then that's not the 
>>>>>>> same as the one for the VM.
>>>>>>
>>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>>> protect drm_gpuvm_bo::evicted
>>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted list 
>>>>>> once we grabbed all
>>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We can 
>>>>>> remove them from the evicted
>>>>>> list on validate(). This way we never touch the evicted list 
>>>>>> without holding at least the VM's
>>>>>> dma-resv lock.
>>>>>>
>>>>>> Do you have any concerns about that?
>>>>>
>>>>> Scratching my head a bit how that is supposed to work.
>>>>>
>>>>> This implies that you go over all the evicted BOs during 
>>>>> validation and not just the one mentioned in the CS.
>>>>>
>>>>> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>>
>>>>>>>> The eviction_lock seems to protect a VM state "evicting" of 
>>>>>>>> whether any BO that
>>>>>>>> is associated with the VM is currently evicting. At the same 
>>>>>>>> time amdgpu protects
>>>>>>>> the eviceted list of the VM with a different lock. So this 
>>>>>>>> seems to be entirely
>>>>>>>> unrelated. Tracking a "currently evicting" state is not part of 
>>>>>>>> the GPUVM
>>>>>>>> implementation currently and hence nothing would change for 
>>>>>>>> amdgpu there.
>>>>>>>
>>>>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>>>>
>>>>>>> The eviction lock and evicted state is for the VM page tables, 
>>>>>>> e.g. if the whole VM is currently not used and swapped out or 
>>>>>>> even de-allocated.
>>>>>>>
>>>>>>> This is necessary because we have cases where we need to access 
>>>>>>> the VM data without holding the dma-resv lock of this VM. 
>>>>>>> Especially figuring out which parts of an address space contain 
>>>>>>> mappings and which doesn't.
>>>>>>
>>>>>> I think this is fine, this has nothing to do with lists of 
>>>>>> evicted GEM objects or external GEM
>>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>>> the VA space does not require any dma-resv locks.
>>>>>
>>>>> I hope so, but I'm not 100% sure.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> This is a requirement which comes with HMM handling, you won't 
>>>>>>> see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>>
>>>>>>>
>>>>>>> The invalidation lock on the other hand is what in this 
>>>>>>> discussion is called eviction lock. This one is needed because 
>>>>>>> what I wrote above, during the move callback only the dma-resv 
>>>>>>> of the BO which is moved is locked, but not necessarily the 
>>>>>>> dma-resv of the VM.
>>>>>>
>>>>>> That's yet another thing, right? This is used to track whether 
>>>>>> *any* BO that belongs to the VM is
>>>>>> currently being evicted, correct? As mentioned, as by now this is 
>>>>>> not supported in GPUVM and hence
>>>>>> would be the same driver specific code with the same driver 
>>>>>> specifc lock.
>>>>>
>>>>> That is most likely a show stopper using this for OpenGL based 
>>>>> workloads as far as I can see. For those you need to able to 
>>>>> figure out which non-VM BOs have been evicted and which parts of 
>>>>> the VM needs updates.
>>>>
>>>> We identify those with a bool in the gpuvm_bo, and that bool is 
>>>> protected by the bo_resv. In essence, the "evicted" list must be 
>>>> made up-to-date with all relevant locks held before traversing in 
>>>> the next exec.
>>>
>>> What I still miss with this idea is how do we find all the 
>>> drm_gpuvm_bo structures with the evicted bool set to true? When 
>>> doing the drm_exec dance we come across all external ones and can 
>>> add them to the list if needed, but what about the BOs having the 
>>> VM's dma-resv?
>>
>> Oh, they can be added to the evict list directly (no bool needed) in 
>> the eviction code, like in v3. Since for those we indeed hold the 
>> VM's dma_resv since it's aliased with the object's dma-resv.
>
> Yeah, I wanted to note what Danilo seems to think about as well. How 
> do we figure out the non-VM BOs evicted?
>
> We can't walk over the list of all non-VM BOs on every submission, 
> that's to much overhead for cases with lots of non-VM BOs.
>
> And we can't rely on userspace sending all non-VM BOs as used list 
> down to the kernel with each submission.
>
> Regards,
> Christian.

No, that's not needed: Mechanism below.

1) We maintain an evicted list. Typically protected by the vm resv.
2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.

a) Evicting a vm bo: The vm resv is held by the eviction code. Just put 
it on the evicted list.
b) Evicting a shared/external bo: The bo resv is held by the eviction 
code. Set the "evicted" bool
c) Validating the evicted list on exec: Loop through all 
*external/shared* bos. Lock them. After locking, check the "evicted" 
bool, if it's true. put the bo on the evicted list (we hold the VM resv 
at this point) and clear the "evicted" bool. Note that other vms will 
have their own gpuvm_bo which is marked evicted.

I have this coded up in a patch for Xe and it seems to be working properly.

/Thomas


>
>>
>> /Thomas
>>
>>
>>
>>>
>>>>
>>>> If you mean that we need to unbind all vmas of all vms of evicted 
>>>> bos before evicting, We don't do that, at least not in Xe, since 
>>>> evicting we wait for VM idle, and it cant access anything through 
>>>> the stale vmas until they have been revalidated and rebound.
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström 
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi!
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas 
>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common 
>>>>>>>>>>>>>>>> infrastructure to
>>>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>>>> allocations and mappings, generically connect GPU VA 
>>>>>>>>>>>>>>>> mappings
>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>> backing buffers and perform more complex mapping 
>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>>>> can potentially be generalized in order to make the DRM 
>>>>>>>>>>>>>>>> GPUVA
>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being 
>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM 
>>>>>>>>>>>>>>>> objects dma-
>>>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 5) Provide some convinience functions for common patterns.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Rather than being designed as a "framework", the target 
>>>>>>>>>>>>>>>> is to
>>>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>>>> features appear as a collection of optional helper 
>>>>>>>>>>>>>>>> functions,
>>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA managers 
>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>> functionality and opt-in for other features without 
>>>>>>>>>>>>>>>> setting
>>>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>>>> updating the GPU VA space within the fence signalling 
>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an 
>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>>>      * particular combination. If not existent a new 
>>>>>>>>>>>>>>>> instance
>>>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>>>> + * validation of evicted objects bound in a 
>>>>>>>>>>>>>>>> &drm_gpuvm. For
>>>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm 
>>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the 
>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as external 
>>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for 
>>>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances 
>>>>>>>>>>>>>>>> unique.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of 
>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * However, drivers still need ensure to protect 
>>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>>>> + * iterating those lists, such as drm_gpuvm_validate() 
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function 
>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Functions adding or removing entries from those lists,
>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() 
>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>>>> + * (safely) modified while potentially being iternated by
>>>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo 
>>>>>>>>>>>>>>>> element
>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to 
>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>> Are the list spinlocks needed for that async state 
>>>>>>>>>>>>>>> update from
>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Otherwise it should be sufficient to protect the lists 
>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If those spinlocks are still needed in some situations, 
>>>>>>>>>>>>>>> perhaps
>>>>>>>>>>>>>>> could we
>>>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the maple 
>>>>>>>>>>>>>>> tree
>>>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>>>> holding only the dma-resv lock from the BO this function 
>>>>>>>>>>>>>> gets
>>>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() 
>>>>>>>>>>>>>> calls with
>>>>>>>>>>>>>> different BOs.
>>>>>>>>>>>>> No. Only if you try to add external objects to the vm's 
>>>>>>>>>>>>> evict list
>>>>>>>>>>>>> from
>>>>>>>>>>>>> within the evict code. That's not necessary since you loop 
>>>>>>>>>>>>> through
>>>>>>>>>>>>> all
>>>>>>>>>>>>> external objects anyway when locking them so an "evicted" 
>>>>>>>>>>>>> bool in
>>>>>>>>>>>>> the vm_bo,
>>>>>>>>>>>>> protected by the bo resv would be sufficient. The extobj 
>>>>>>>>>>>>> locking
>>>>>>>>>>>>> loop can
>>>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>>>> And validate() can remove it while still holding all 
>>>>>>>>>>>> dma-resv locks,
>>>>>>>>>>>> neat!
>>>>>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>>>>>> concurrently? What
>>>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>>>> Are we guaranteed that at this point of time the 
>>>>>>>>>>>> drm_gpuvm_bo is not
>>>>>>>>>>>> on the
>>>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>> with the
>>>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>> might drop the last reference to the drm_gem_object and 
>>>>>>>>>>>> hence we'd
>>>>>>>>>>>> potentially
>>>>>>>>>>>> free the dma-resv lock while holding it, at least if it's 
>>>>>>>>>>>> an external
>>>>>>>>>>>> object.
>>>>>>>>>>> Easiest way in this scheme is to think of the lists as being 
>>>>>>>>>>> protected
>>>>>>>>>>> by the vm's resv lock. That means anybody calling unlink() 
>>>>>>>>>>> must also
>>>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of 
>>>>>>>>>>> view, but
>>>>>>>>>>> perhaps not from a locking inversion POW from an async list 
>>>>>>>>>>> update).
>>>>>>>>>> This would mean that on unlink() we'd need to hold the VM's 
>>>>>>>>>> resv lock and the
>>>>>>>>>> corresponding GEM's resv lock (in case they're not the same 
>>>>>>>>>> anyways) because the
>>>>>>>>>> VM's resv lock would protect the external / evicted object 
>>>>>>>>>> lists and the GEM
>>>>>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos 
>>>>>>>>>> and the
>>>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>>>
>>>>>>>>>>>>>> For extobjs an outer lock would be enough in case of Xe, 
>>>>>>>>>>>>>> but I
>>>>>>>>>>>>>> really would not
>>>>>>>>>>>>>> like to add even more complexity just to get the spinlock 
>>>>>>>>>>>>>> out of
>>>>>>>>>>>>>> the way in case
>>>>>>>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>>>>>>>> I must disagree here. These spinlocks and atomic 
>>>>>>>>>>>>> operations are
>>>>>>>>>>>>> pretty
>>>>>>>>>>>>> costly and as discussed earlier this type of locking was 
>>>>>>>>>>>>> the reason
>>>>>>>>>>>>> (at
>>>>>>>>>>>>> least according to the commit message) that made Christian 
>>>>>>>>>>>>> drop the
>>>>>>>>>>>>> XArray
>>>>>>>>>>>>> use in drm_exec for the same set of objects: "The locking 
>>>>>>>>>>>>> overhead
>>>>>>>>>>>>> is
>>>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>>>>>>> complexity and a
>>>>>>>>>>>>> single wide lock following the drm locking guidelines set 
>>>>>>>>>>>>> out by
>>>>>>>>>>>>> Daniel and
>>>>>>>>>>>>> David should really be the default choice with an opt-in 
>>>>>>>>>>>>> for a
>>>>>>>>>>>>> spinlock if
>>>>>>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>>>>>>> For the external object list an outer lock would work as 
>>>>>>>>>>>> long as it's
>>>>>>>>>>>> not the
>>>>>>>>>>>> dma-resv lock of the corresponding GEM object, since here 
>>>>>>>>>>>> we actually
>>>>>>>>>>>> need to
>>>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>>>> It's just a bit weird design wise that drivers would need 
>>>>>>>>>>>> to take
>>>>>>>>>>>> this outer
>>>>>>>>>>>> lock on:
>>>>>>>>>>>>
>>>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>>>
>>>>>>>>>>>> Given that it seems reasonable to do all the required locking
>>>>>>>>>>>> internally.
>>>>>>>>>>>  From a design POW, there has been a clear direction in XE 
>>>>>>>>>>> to make
>>>>>>>>>>> things similar to mmap() / munmap(), so this outer lock, 
>>>>>>>>>>> which in Xe is
>>>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's 
>>>>>>>>>>> protecting
>>>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>>>> structures and
>>>>>>>>>>> the extobj list. Basically it's taken early in the exec 
>>>>>>>>>>> IOCTL, the
>>>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault 
>>>>>>>>>>> handler, so
>>>>>>>>>>> all of the above are just asserting that it is taken in the 
>>>>>>>>>>> correct
>>>>>>>>>>> mode.
>>>>>>>>>>>
>>>>>>>>>>> But strictly with this scheme one could also use the vm's 
>>>>>>>>>>> dma_resv for
>>>>>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>>>>>> traversing the
>>>>>>>>>>> list.
>>>>>>>>>>>
>>>>>>>>>>> The whole point of this scheme is to rely on locks that you 
>>>>>>>>>>> already are
>>>>>>>>>>> supposed to be holding for various reasons and is simple to 
>>>>>>>>>>> comprehend.
>>>>>>>>>> I don't agree that we're supposed to hold the VM's resv lock 
>>>>>>>>>> anyways for
>>>>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but 
>>>>>>>>>> I'm fine using it
>>>>>>>>>> for that purpose nevertheless.
>>>>>>>>>>
>>>>>>>>>>>> In order to at least place lockdep checks, the driver would 
>>>>>>>>>>>> need to
>>>>>>>>>>>> supply the
>>>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM 
>>>>>>>>>>>> otherwise doesn't
>>>>>>>>>>>> know about
>>>>>>>>>>>> the lock.
>>>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>>>> I'd really like to avoid that, especially now that everything 
>>>>>>>>>> got simpler. We
>>>>>>>>>> should define the actual locks to take instead.
>>>>>>>>>>
>>>>>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() 
>>>>>>>>>>>> that doesn't
>>>>>>>>>>>> need to
>>>>>>>>>>>> spin?
>>>>>>>>>>> I guess it's hard to tell exactly, but it is much lower on 
>>>>>>>>>>> modern x86
>>>>>>>>>>> than what it used to be. Not sure about ARM, which is the other
>>>>>>>>>>> architecture important to us. I figure if there is little 
>>>>>>>>>>> cache-line
>>>>>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>>>>>
>>>>>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>>>>>
>>>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm 
>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>> spinlock_t
>>>>>>>>>>>>> *lock)
>>>>>>>>>>>>>
>>>>>>>>>>>>> {
>>>>>>>>>>>>>
>>>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>>>
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For such drivers, that would require anybody calling 
>>>>>>>>>>>>>>> unlink to
>>>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock for 
>>>>>>>>>>>>>> the GEMs
>>>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>>>> VM_BO list to be more precise). We can't just use the 
>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>> lock for that
>>>>>>>>>>>>>> with VM_BO abstractions, because on destruction of a 
>>>>>>>>>>>>>> VM_BO we
>>>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's the 
>>>>>>>>>>>>>> fix I
>>>>>>>>>>>>>> was referring to
>>>>>>>>>>>>>> earlier.
>>>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the 
>>>>>>>>>>>>> GEM's gpuva
>>>>>>>>>>>>> list, but
>>>>>>>>>>>>> holding the vm's dma-resv lock across the unlink shouldn't 
>>>>>>>>>>>>> be a
>>>>>>>>>>>>> problem. We
>>>>>>>>>>>>> may free the object and a pointer to the vm's resv during 
>>>>>>>>>>>>> unlink
>>>>>>>>>>>>> but we
>>>>>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring 
>>>>>>>>>>>>> that any
>>>>>>>>>>>>> calls to
>>>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>>>> Drivers calling unlink() from the fence signaling path 
>>>>>>>>>>>> can't use the
>>>>>>>>>>>> VM's
>>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>> Yes, that made me a bit curious because in the current 
>>>>>>>>>>> version the code
>>>>>>>>>>> required the object's dma_resv for unlink() which can't be 
>>>>>>>>>>> grabbed
>>>>>>>>>>> either from the fence signaling path. So are there any 
>>>>>>>>>>> drivers actually
>>>>>>>>>>> wanting to do that? If so, they will either need to resort 
>>>>>>>>>>> to the
>>>>>>>>>>> current spinlock solution or they will need to call unlink 
>>>>>>>>>>> from a
>>>>>>>>>>> workqueue item.
>>>>>>>>>> As Boris already mentioned we have the dma-resv lock by 
>>>>>>>>>> default or a driver
>>>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the 
>>>>>>>>>> latter.
>>>>>>>>>>
>>>>>>>>>>>> Also, what if the object is an external object? We can't 
>>>>>>>>>>>> use the VM's
>>>>>>>>>>>> dma-resv
>>>>>>>>>>>> lock here.
>>>>>>>>>>> Why? Typically (sync) unlink is only ever called from an 
>>>>>>>>>>> unbind-like
>>>>>>>>>>> operation where it should be trivial to grab the vm's resv. 
>>>>>>>>>>> Or, for
>>>>>>>>>>> that matter any outer lock protecting the extobj list. Rule 
>>>>>>>>>>> would be
>>>>>>>>>>> the drm_gpuvm_bo::entry::extobj  and 
>>>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>>>> be protected by either the vm's dma_resv (or possibly an 
>>>>>>>>>>> outer lock in
>>>>>>>>>>> the case of the extobj list).
>>>>>>>>>> Outer lock wouldn't have been working for updates in the 
>>>>>>>>>> async path, but
>>>>>>>>>> shouldn't be relevant anymore. We could use the VM's resv for 
>>>>>>>>>> that.
>>>>>>>>>>
>>>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held when 
>>>>>>>>>>>> calling
>>>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if 
>>>>>>>>>>>> the
>>>>>>>>>>>> refcount drops
>>>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>>>> drop the
>>>>>>>>>>>> last reference of the GEM object.
>>>>>>>>>>> Yes, but this is a different problem as to what exactly 
>>>>>>>>>>> protects
>>>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal 
>>>>>>>>>>> per bo list
>>>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need to 
>>>>>>>>>>> ensure that
>>>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts 
>>>>>>>>>>> its obj
>>>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>>>> refcount (I know
>>>>>>>>>>> Boris didn't like that, but requiring an explicit refcount 
>>>>>>>>>>> for a
>>>>>>>>>>> pointer you dereference unless you're under a lock that 
>>>>>>>>>>> ensures keeping
>>>>>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or 
>>>>>>>>>>> internal spinlock)
>>>>>>>>>>> I don't have a strong preference.
>>>>>>>>>> We can keep the GEM objects dma-resv lock, however as 
>>>>>>>>>> mentioned above
>>>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both 
>>>>>>>>>> the VM's resv lock
>>>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>>>
>>>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>>>> I don't think these are real problems.
>>>>>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>>>>>> currently have
>>>>>>>>>>> slightly different approach to collect external bos needing 
>>>>>>>>>>> rebinding,
>>>>>>>>>>> we have this working fine.
>>>>>>>>>>>
>>>>>>>>>>> TBH I think pretty much the only situation where the 
>>>>>>>>>>> spinlock is needed
>>>>>>>>>>> is for async updates of these lists, unless a wq item can be 
>>>>>>>>>>> used for
>>>>>>>>>>> that, but it doesn't really seem like the current code 
>>>>>>>>>>> allows for such
>>>>>>>>>>> updates anyway? It complicates the code a lot, adds overhead 
>>>>>>>>>>> and also
>>>>>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>>>>>
>>>>>>>>>>> /Thomas
>>>>>>>>>>>
>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It seems that with that also the refcount could be make 
>>>>>>>>>>>>>>> non-
>>>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All in the spirit of the drm locking guidelines "use big 
>>>>>>>>>>>>>>> locks
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>> possible".
>>>>>>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>>>> {                                                        \
>>>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list 
>>>>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., vm_bo);
>>>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Only used for internal list iterations, not meant 
>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to 
>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to 
>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we should call
>>>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>>>> + * to restore the original state and let new 
>>>>>>>>>>>>>>>> iterations take
>>>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the 
>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the 
>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>     #define to_drm_gpuva(__node) container_of((__node), 
>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm 
>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>>>>>               "GPUVA tree is not empty, potentially 
>>>>>>>>>>>>>>>> leaking
>>>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj 
>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all 
>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Note: This function is safe against concurrent 
>>>>>>>>>>>>>>>> insertion
>>>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this 
>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the 
>>>>>>>>>>>>>>>> GPUVM's
>>>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>> vm_bo->obj,
>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped 
>>>>>>>>>>>>>>>> within
>>>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, 
>>>>>>>>>>>>>>>> end) {
>>>>>>>>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects 
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>>>> + * being set the driver receives the given @fn 
>>>>>>>>>>>>>>>> callback to
>>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec 
>>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, 
>>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec); 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned 
>>>>>>>>>>>>>>>> int
>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, 
>>>>>>>>>>>>>>>> args-
>>>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of all
>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>>>> + * @num_objs: the number of additional 
>>>>>>>>>>>>>>>> &drm_gem_objects to
>>>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects 
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>>>> + * contains mappings of, plus the ones given through 
>>>>>>>>>>>>>>>> @objs.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as 
>>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, 
>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private and 
>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>>>>>>>> +                        enum dma_resv_usage 
>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>> +                        enum dma_resv_usage extobj_usage)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, 
>>>>>>>>>>>>>>>> obj) {
>>>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance of 
>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref 
>>>>>>>>>>>>>>>> *kref)
>>>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref 
>>>>>>>>>>>>>>>> *kref)
>>>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the 
>>>>>>>>>>>>>>>> reference of
>>>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * If the reference count drops to zero, the &gpuvm_bo is
>>>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. 
>>>>>>>>>>>>>>>> Hence, if
>>>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>>>> + * function can potentially let the reference count to 
>>>>>>>>>>>>>>>> zero
>>>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct drm_gpuvm_bo
>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo 
>>>>>>>>>>>>>>>> to its
>>>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its 
>>>>>>>>>>>>>>>> &drm_gpuvm's the
>>>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj 
>>>>>>>>>>>>>>>> list if
>>>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>>>> + * already and if the corresponding &drm_gem_object is an
>>>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a 
>>>>>>>>>>>>>>>> &drm_gem_object to
>>>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool 
>>>>>>>>>>>>>>>> evict)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, 
>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>> extobj list
>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>> +        * @evict: structure holding the evict list and 
>>>>>>>>>>>>>>>> evict
>>>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the evict 
>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv differs
>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct drm_gpuva 
>>>>>>>>>>>>>>>> *va)
>>>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, 
>>>>>>>>>>>>>>>> gpuvm__)
>>>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, 
>>>>>>>>>>>>>>>> &(gpuvm__)-
>>>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock 
>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA 
>>>>>>>>>>>>>>>> reservations
>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>> +        * @extra: Callback and corresponding private data
>>>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common 
>>>>>>>>>>>>>>>> dma-
>>>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>>>>>>>> +                           unsigned int num_fences);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>> +                             struct drm_gem_object 
>>>>>>>>>>>>>>>> **objs,
>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all 
>>>>>>>>>>>>>>>> assoiciated
>>>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on failure.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>> +                             struct drm_exec *exec,
>>>>>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>> +                             struct dma_fence *fence,
>>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>> +                        * @evict: List entry to attach to
>>>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk 
>>>>>>>>>>>>>>>> over a
>>>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, 
>>>>>>>>>>>>>>>> void
>>>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>> +        * Drivers receive this callback for every evicted
>>>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void 
>>>>>>>>>>>>>>>> *priv,
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
Thomas Hellstrom Sept. 20, 2023, 8:29 a.m. UTC | #51
On 9/20/23 09:44, Thomas Hellström wrote:
> Hi,
>
> On 9/20/23 07:37, Christian König wrote:
>> Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>>>
>>> On 9/19/23 17:16, Danilo Krummrich wrote:
>>>> On 9/19/23 14:21, Thomas Hellström wrote:
>>>>> Hi Christian
>>>>>
>>>>> On 9/19/23 14:07, Christian König wrote:
>>>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>>>> As mentioned in a different mail thread, the reply is based 
>>>>>>>>>>> on the assumption
>>>>>>>>>>> that we don't support anything else than GPUVM updates from 
>>>>>>>>>>> the IOCTL.
>>>>>>>>>>
>>>>>>>>>> I think that this assumption is incorrect.
>>>>>>>>>
>>>>>>>>> Well, more precisely I should have said "don't support GPUVM 
>>>>>>>>> updated from within
>>>>>>>>> fence signaling critical sections". And looking at the code, 
>>>>>>>>> that doesn't seem what
>>>>>>>>> you're doing there.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Vulkan is just once specific use case, but this here should 
>>>>>>>>>> probably be able to handle other use cases as well.
>>>>>>>>>>
>>>>>>>>>> Especially with HMM you get the requirement that you need to 
>>>>>>>>>> be able to invalidate GPUVM mappings without grabbing a 
>>>>>>>>>> reservation lock.
>>>>>>>>>
>>>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>>>> should only be called from a ttm_device_funcs::move callback, 
>>>>>>>>> we should hold the dma-resv
>>>>>>>>> lock there.
>>>>>>>>
>>>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>>>
>>>>>>>> In the move callback we only hold the dma-resv lock of the BO 
>>>>>>>> which is moved, but when that is a shared BO then that's not 
>>>>>>>> the same as the one for the VM.
>>>>>>>
>>>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>>>> protect drm_gpuvm_bo::evicted
>>>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted list 
>>>>>>> once we grabbed all
>>>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We can 
>>>>>>> remove them from the evicted
>>>>>>> list on validate(). This way we never touch the evicted list 
>>>>>>> without holding at least the VM's
>>>>>>> dma-resv lock.
>>>>>>>
>>>>>>> Do you have any concerns about that?
>>>>>>
>>>>>> Scratching my head a bit how that is supposed to work.
>>>>>>
>>>>>> This implies that you go over all the evicted BOs during 
>>>>>> validation and not just the one mentioned in the CS.
>>>>>>
>>>>>> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>>>
>>>>>>>>> The eviction_lock seems to protect a VM state "evicting" of 
>>>>>>>>> whether any BO that
>>>>>>>>> is associated with the VM is currently evicting. At the same 
>>>>>>>>> time amdgpu protects
>>>>>>>>> the eviceted list of the VM with a different lock. So this 
>>>>>>>>> seems to be entirely
>>>>>>>>> unrelated. Tracking a "currently evicting" state is not part 
>>>>>>>>> of the GPUVM
>>>>>>>>> implementation currently and hence nothing would change for 
>>>>>>>>> amdgpu there.
>>>>>>>>
>>>>>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>>>>>
>>>>>>>> The eviction lock and evicted state is for the VM page tables, 
>>>>>>>> e.g. if the whole VM is currently not used and swapped out or 
>>>>>>>> even de-allocated.
>>>>>>>>
>>>>>>>> This is necessary because we have cases where we need to access 
>>>>>>>> the VM data without holding the dma-resv lock of this VM. 
>>>>>>>> Especially figuring out which parts of an address space contain 
>>>>>>>> mappings and which doesn't.
>>>>>>>
>>>>>>> I think this is fine, this has nothing to do with lists of 
>>>>>>> evicted GEM objects or external GEM
>>>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>>>> the VA space does not require any dma-resv locks.
>>>>>>
>>>>>> I hope so, but I'm not 100% sure.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> This is a requirement which comes with HMM handling, you won't 
>>>>>>>> see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>>>
>>>>>>>>
>>>>>>>> The invalidation lock on the other hand is what in this 
>>>>>>>> discussion is called eviction lock. This one is needed because 
>>>>>>>> what I wrote above, during the move callback only the dma-resv 
>>>>>>>> of the BO which is moved is locked, but not necessarily the 
>>>>>>>> dma-resv of the VM.
>>>>>>>
>>>>>>> That's yet another thing, right? This is used to track whether 
>>>>>>> *any* BO that belongs to the VM is
>>>>>>> currently being evicted, correct? As mentioned, as by now this 
>>>>>>> is not supported in GPUVM and hence
>>>>>>> would be the same driver specific code with the same driver 
>>>>>>> specifc lock.
>>>>>>
>>>>>> That is most likely a show stopper using this for OpenGL based 
>>>>>> workloads as far as I can see. For those you need to able to 
>>>>>> figure out which non-VM BOs have been evicted and which parts of 
>>>>>> the VM needs updates.
>>>>>
>>>>> We identify those with a bool in the gpuvm_bo, and that bool is 
>>>>> protected by the bo_resv. In essence, the "evicted" list must be 
>>>>> made up-to-date with all relevant locks held before traversing in 
>>>>> the next exec.
>>>>
>>>> What I still miss with this idea is how do we find all the 
>>>> drm_gpuvm_bo structures with the evicted bool set to true? When 
>>>> doing the drm_exec dance we come across all external ones and can 
>>>> add them to the list if needed, but what about the BOs having the 
>>>> VM's dma-resv?
>>>
>>> Oh, they can be added to the evict list directly (no bool needed) in 
>>> the eviction code, like in v3. Since for those we indeed hold the 
>>> VM's dma_resv since it's aliased with the object's dma-resv.
>>
>> Yeah, I wanted to note what Danilo seems to think about as well. How 
>> do we figure out the non-VM BOs evicted?
>>
>> We can't walk over the list of all non-VM BOs on every submission, 
>> that's to much overhead for cases with lots of non-VM BOs.
>>
>> And we can't rely on userspace sending all non-VM BOs as used list 
>> down to the kernel with each submission.
>>
>> Regards,
>> Christian.
>
> No, that's not needed: Mechanism below.
>
> 1) We maintain an evicted list. Typically protected by the vm resv.
> 2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.
>
> a) Evicting a vm bo: The vm resv is held by the eviction code. Just 
> put it on the evicted list.
> b) Evicting a shared/external bo: The bo resv is held by the eviction 
> code. Set the "evicted" bool
> c) Validating the evicted list on exec: Loop through all 
> *external/shared* bos. Lock them. After locking, check the "evicted" 
> bool, if it's true. put the bo on the evicted list (we hold the VM 
> resv at this point) and clear the "evicted" bool. Note that other vms 
> will have their own gpuvm_bo which is marked evicted.
>
> I have this coded up in a patch for Xe and it seems to be working 
> properly.
>
> /Thomas
>
Something along the lines of the attach patch.
Christian König Sept. 20, 2023, 10:51 a.m. UTC | #52
Am 20.09.23 um 09:44 schrieb Thomas Hellström:
> Hi,
>
> On 9/20/23 07:37, Christian König wrote:
>> Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>>>
>>> On 9/19/23 17:16, Danilo Krummrich wrote:
>>>> On 9/19/23 14:21, Thomas Hellström wrote:
>>>>> Hi Christian
>>>>>
>>>>> On 9/19/23 14:07, Christian König wrote:
>>>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>>>> As mentioned in a different mail thread, the reply is based 
>>>>>>>>>>> on the assumption
>>>>>>>>>>> that we don't support anything else than GPUVM updates from 
>>>>>>>>>>> the IOCTL.
>>>>>>>>>>
>>>>>>>>>> I think that this assumption is incorrect.
>>>>>>>>>
>>>>>>>>> Well, more precisely I should have said "don't support GPUVM 
>>>>>>>>> updated from within
>>>>>>>>> fence signaling critical sections". And looking at the code, 
>>>>>>>>> that doesn't seem what
>>>>>>>>> you're doing there.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Vulkan is just once specific use case, but this here should 
>>>>>>>>>> probably be able to handle other use cases as well.
>>>>>>>>>>
>>>>>>>>>> Especially with HMM you get the requirement that you need to 
>>>>>>>>>> be able to invalidate GPUVM mappings without grabbing a 
>>>>>>>>>> reservation lock.
>>>>>>>>>
>>>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>>>> should only be called from a ttm_device_funcs::move callback, 
>>>>>>>>> we should hold the dma-resv
>>>>>>>>> lock there.
>>>>>>>>
>>>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>>>
>>>>>>>> In the move callback we only hold the dma-resv lock of the BO 
>>>>>>>> which is moved, but when that is a shared BO then that's not 
>>>>>>>> the same as the one for the VM.
>>>>>>>
>>>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>>>> protect drm_gpuvm_bo::evicted
>>>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted list 
>>>>>>> once we grabbed all
>>>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We can 
>>>>>>> remove them from the evicted
>>>>>>> list on validate(). This way we never touch the evicted list 
>>>>>>> without holding at least the VM's
>>>>>>> dma-resv lock.
>>>>>>>
>>>>>>> Do you have any concerns about that?
>>>>>>
>>>>>> Scratching my head a bit how that is supposed to work.
>>>>>>
>>>>>> This implies that you go over all the evicted BOs during 
>>>>>> validation and not just the one mentioned in the CS.
>>>>>>
>>>>>> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>>>
>>>>>>>>> The eviction_lock seems to protect a VM state "evicting" of 
>>>>>>>>> whether any BO that
>>>>>>>>> is associated with the VM is currently evicting. At the same 
>>>>>>>>> time amdgpu protects
>>>>>>>>> the eviceted list of the VM with a different lock. So this 
>>>>>>>>> seems to be entirely
>>>>>>>>> unrelated. Tracking a "currently evicting" state is not part 
>>>>>>>>> of the GPUVM
>>>>>>>>> implementation currently and hence nothing would change for 
>>>>>>>>> amdgpu there.
>>>>>>>>
>>>>>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>>>>>
>>>>>>>> The eviction lock and evicted state is for the VM page tables, 
>>>>>>>> e.g. if the whole VM is currently not used and swapped out or 
>>>>>>>> even de-allocated.
>>>>>>>>
>>>>>>>> This is necessary because we have cases where we need to access 
>>>>>>>> the VM data without holding the dma-resv lock of this VM. 
>>>>>>>> Especially figuring out which parts of an address space contain 
>>>>>>>> mappings and which doesn't.
>>>>>>>
>>>>>>> I think this is fine, this has nothing to do with lists of 
>>>>>>> evicted GEM objects or external GEM
>>>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>>>> the VA space does not require any dma-resv locks.
>>>>>>
>>>>>> I hope so, but I'm not 100% sure.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> This is a requirement which comes with HMM handling, you won't 
>>>>>>>> see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>>>
>>>>>>>>
>>>>>>>> The invalidation lock on the other hand is what in this 
>>>>>>>> discussion is called eviction lock. This one is needed because 
>>>>>>>> what I wrote above, during the move callback only the dma-resv 
>>>>>>>> of the BO which is moved is locked, but not necessarily the 
>>>>>>>> dma-resv of the VM.
>>>>>>>
>>>>>>> That's yet another thing, right? This is used to track whether 
>>>>>>> *any* BO that belongs to the VM is
>>>>>>> currently being evicted, correct? As mentioned, as by now this 
>>>>>>> is not supported in GPUVM and hence
>>>>>>> would be the same driver specific code with the same driver 
>>>>>>> specifc lock.
>>>>>>
>>>>>> That is most likely a show stopper using this for OpenGL based 
>>>>>> workloads as far as I can see. For those you need to able to 
>>>>>> figure out which non-VM BOs have been evicted and which parts of 
>>>>>> the VM needs updates.
>>>>>
>>>>> We identify those with a bool in the gpuvm_bo, and that bool is 
>>>>> protected by the bo_resv. In essence, the "evicted" list must be 
>>>>> made up-to-date with all relevant locks held before traversing in 
>>>>> the next exec.
>>>>
>>>> What I still miss with this idea is how do we find all the 
>>>> drm_gpuvm_bo structures with the evicted bool set to true? When 
>>>> doing the drm_exec dance we come across all external ones and can 
>>>> add them to the list if needed, but what about the BOs having the 
>>>> VM's dma-resv?
>>>
>>> Oh, they can be added to the evict list directly (no bool needed) in 
>>> the eviction code, like in v3. Since for those we indeed hold the 
>>> VM's dma_resv since it's aliased with the object's dma-resv.
>>
>> Yeah, I wanted to note what Danilo seems to think about as well. How 
>> do we figure out the non-VM BOs evicted?
>>
>> We can't walk over the list of all non-VM BOs on every submission, 
>> that's to much overhead for cases with lots of non-VM BOs.
>>
>> And we can't rely on userspace sending all non-VM BOs as used list 
>> down to the kernel with each submission.
>>
>> Regards,
>> Christian.
>
> No, that's not needed: Mechanism below.
>
> 1) We maintain an evicted list. Typically protected by the vm resv.
> 2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.
>
> a) Evicting a vm bo: The vm resv is held by the eviction code. Just 
> put it on the evicted list.
> b) Evicting a shared/external bo: The bo resv is held by the eviction 
> code. Set the "evicted" bool
> c) Validating the evicted list on exec:


> Loop through all *external/shared* bos.

And this is what you can't do. For Vulkan it probably doesn't matter, 
but for OpenGL and especially multimedia we have much more BOs on the 
shared list than what's allocated for the VM.

Regards,
Christian.

> Lock them. After locking, check the "evicted" bool, if it's true. put 
> the bo on the evicted list (we hold the VM resv at this point) and 
> clear the "evicted" bool. Note that other vms will have their own 
> gpuvm_bo which is marked evicted.
>
> I have this coded up in a patch for Xe and it seems to be working 
> properly.
>
> /Thomas
>
>
>>
>>>
>>> /Thomas
>>>
>>>
>>>
>>>>
>>>>>
>>>>> If you mean that we need to unbind all vmas of all vms of evicted 
>>>>> bos before evicting, We don't do that, at least not in Xe, since 
>>>>> evicting we wait for VM idle, and it cant access anything through 
>>>>> the stale vmas until they have been revalidated and rebound.
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi!
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas 
>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common 
>>>>>>>>>>>>>>>>> infrastructure to
>>>>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>>>>> allocations and mappings, generically connect GPU VA 
>>>>>>>>>>>>>>>>> mappings
>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>> backing buffers and perform more complex mapping 
>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>>>>> can potentially be generalized in order to make the 
>>>>>>>>>>>>>>>>> DRM GPUVA
>>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this context,
>>>>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not being 
>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM 
>>>>>>>>>>>>>>>>> objects dma-
>>>>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 5) Provide some convinience functions for common 
>>>>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Rather than being designed as a "framework", the 
>>>>>>>>>>>>>>>>> target is to
>>>>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>>>>> features appear as a collection of optional helper 
>>>>>>>>>>>>>>>>> functions,
>>>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA managers 
>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>> functionality and opt-in for other features without 
>>>>>>>>>>>>>>>>> setting
>>>>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>>>>> flags, just by making use of the corresponding functions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>>>>> updating the GPU VA space within the fence signalling 
>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>>>     include/drm/drm_gpuvm.h     | 197 ++++++++++++++
>>>>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an 
>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>>>>      * particular combination. If not existent a new 
>>>>>>>>>>>>>>>>> instance
>>>>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>>>>> + * list are maintained in order to accelerate locking of
>>>>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>>>>> + * validation of evicted objects bound in a 
>>>>>>>>>>>>>>>>> &drm_gpuvm. For
>>>>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm 
>>>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. It is
>>>>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the 
>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as external 
>>>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for 
>>>>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances 
>>>>>>>>>>>>>>>>> unique.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of 
>>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * However, drivers still need ensure to protect 
>>>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>>>>> + * iterating those lists, such as 
>>>>>>>>>>>>>>>>> drm_gpuvm_validate() and
>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function 
>>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Functions adding or removing entries from those 
>>>>>>>>>>>>>>>>> lists,
>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() 
>>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>>>>> + * (safely) modified while potentially being 
>>>>>>>>>>>>>>>>> iternated by
>>>>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo 
>>>>>>>>>>>>>>>>> element
>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to 
>>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>> Are the list spinlocks needed for that async state 
>>>>>>>>>>>>>>>> update from
>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Otherwise it should be sufficient to protect the lists 
>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If those spinlocks are still needed in some situations, 
>>>>>>>>>>>>>>>> perhaps
>>>>>>>>>>>>>>>> could we
>>>>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the maple 
>>>>>>>>>>>>>>>> tree
>>>>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>>>>> holding only the dma-resv lock from the BO this function 
>>>>>>>>>>>>>>> gets
>>>>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() 
>>>>>>>>>>>>>>> calls with
>>>>>>>>>>>>>>> different BOs.
>>>>>>>>>>>>>> No. Only if you try to add external objects to the vm's 
>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> within the evict code. That's not necessary since you 
>>>>>>>>>>>>>> loop through
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> external objects anyway when locking them so an "evicted" 
>>>>>>>>>>>>>> bool in
>>>>>>>>>>>>>> the vm_bo,
>>>>>>>>>>>>>> protected by the bo resv would be sufficient. The extobj 
>>>>>>>>>>>>>> locking
>>>>>>>>>>>>>> loop can
>>>>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>>>>> And validate() can remove it while still holding all 
>>>>>>>>>>>>> dma-resv locks,
>>>>>>>>>>>>> neat!
>>>>>>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>>>>>>> concurrently? What
>>>>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>>>>> Are we guaranteed that at this point of time the 
>>>>>>>>>>>>> drm_gpuvm_bo is not
>>>>>>>>>>>>> on the
>>>>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>> with the
>>>>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>> might drop the last reference to the drm_gem_object and 
>>>>>>>>>>>>> hence we'd
>>>>>>>>>>>>> potentially
>>>>>>>>>>>>> free the dma-resv lock while holding it, at least if it's 
>>>>>>>>>>>>> an external
>>>>>>>>>>>>> object.
>>>>>>>>>>>> Easiest way in this scheme is to think of the lists as 
>>>>>>>>>>>> being protected
>>>>>>>>>>>> by the vm's resv lock. That means anybody calling unlink() 
>>>>>>>>>>>> must also
>>>>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of 
>>>>>>>>>>>> view, but
>>>>>>>>>>>> perhaps not from a locking inversion POW from an async list 
>>>>>>>>>>>> update).
>>>>>>>>>>> This would mean that on unlink() we'd need to hold the VM's 
>>>>>>>>>>> resv lock and the
>>>>>>>>>>> corresponding GEM's resv lock (in case they're not the same 
>>>>>>>>>>> anyways) because the
>>>>>>>>>>> VM's resv lock would protect the external / evicted object 
>>>>>>>>>>> lists and the GEM
>>>>>>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos 
>>>>>>>>>>> and the
>>>>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>>>>
>>>>>>>>>>>>>>> For extobjs an outer lock would be enough in case of Xe, 
>>>>>>>>>>>>>>> but I
>>>>>>>>>>>>>>> really would not
>>>>>>>>>>>>>>> like to add even more complexity just to get the 
>>>>>>>>>>>>>>> spinlock out of
>>>>>>>>>>>>>>> the way in case
>>>>>>>>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>>>>>>>>> I must disagree here. These spinlocks and atomic 
>>>>>>>>>>>>>> operations are
>>>>>>>>>>>>>> pretty
>>>>>>>>>>>>>> costly and as discussed earlier this type of locking was 
>>>>>>>>>>>>>> the reason
>>>>>>>>>>>>>> (at
>>>>>>>>>>>>>> least according to the commit message) that made 
>>>>>>>>>>>>>> Christian drop the
>>>>>>>>>>>>>> XArray
>>>>>>>>>>>>>> use in drm_exec for the same set of objects: "The locking 
>>>>>>>>>>>>>> overhead
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>>>>>>>> complexity and a
>>>>>>>>>>>>>> single wide lock following the drm locking guidelines set 
>>>>>>>>>>>>>> out by
>>>>>>>>>>>>>> Daniel and
>>>>>>>>>>>>>> David should really be the default choice with an opt-in 
>>>>>>>>>>>>>> for a
>>>>>>>>>>>>>> spinlock if
>>>>>>>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>>>>>>>> For the external object list an outer lock would work as 
>>>>>>>>>>>>> long as it's
>>>>>>>>>>>>> not the
>>>>>>>>>>>>> dma-resv lock of the corresponding GEM object, since here 
>>>>>>>>>>>>> we actually
>>>>>>>>>>>>> need to
>>>>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>>>>> It's just a bit weird design wise that drivers would need 
>>>>>>>>>>>>> to take
>>>>>>>>>>>>> this outer
>>>>>>>>>>>>> lock on:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>>>>
>>>>>>>>>>>>> Given that it seems reasonable to do all the required locking
>>>>>>>>>>>>> internally.
>>>>>>>>>>>>  From a design POW, there has been a clear direction in XE 
>>>>>>>>>>>> to make
>>>>>>>>>>>> things similar to mmap() / munmap(), so this outer lock, 
>>>>>>>>>>>> which in Xe is
>>>>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's 
>>>>>>>>>>>> protecting
>>>>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>>>>> structures and
>>>>>>>>>>>> the extobj list. Basically it's taken early in the exec 
>>>>>>>>>>>> IOCTL, the
>>>>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault 
>>>>>>>>>>>> handler, so
>>>>>>>>>>>> all of the above are just asserting that it is taken in the 
>>>>>>>>>>>> correct
>>>>>>>>>>>> mode.
>>>>>>>>>>>>
>>>>>>>>>>>> But strictly with this scheme one could also use the vm's 
>>>>>>>>>>>> dma_resv for
>>>>>>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>>>>>>> traversing the
>>>>>>>>>>>> list.
>>>>>>>>>>>>
>>>>>>>>>>>> The whole point of this scheme is to rely on locks that you 
>>>>>>>>>>>> already are
>>>>>>>>>>>> supposed to be holding for various reasons and is simple to 
>>>>>>>>>>>> comprehend.
>>>>>>>>>>> I don't agree that we're supposed to hold the VM's resv lock 
>>>>>>>>>>> anyways for
>>>>>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but 
>>>>>>>>>>> I'm fine using it
>>>>>>>>>>> for that purpose nevertheless.
>>>>>>>>>>>
>>>>>>>>>>>>> In order to at least place lockdep checks, the driver 
>>>>>>>>>>>>> would need to
>>>>>>>>>>>>> supply the
>>>>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM 
>>>>>>>>>>>>> otherwise doesn't
>>>>>>>>>>>>> know about
>>>>>>>>>>>>> the lock.
>>>>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>>>>> I'd really like to avoid that, especially now that 
>>>>>>>>>>> everything got simpler. We
>>>>>>>>>>> should define the actual locks to take instead.
>>>>>>>>>>>
>>>>>>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() 
>>>>>>>>>>>>> that doesn't
>>>>>>>>>>>>> need to
>>>>>>>>>>>>> spin?
>>>>>>>>>>>> I guess it's hard to tell exactly, but it is much lower on 
>>>>>>>>>>>> modern x86
>>>>>>>>>>>> than what it used to be. Not sure about ARM, which is the 
>>>>>>>>>>>> other
>>>>>>>>>>>> architecture important to us. I figure if there is little 
>>>>>>>>>>>> cache-line
>>>>>>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>>>>>>
>>>>>>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm 
>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>> spinlock_t
>>>>>>>>>>>>>> *lock)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For such drivers, that would require anybody calling 
>>>>>>>>>>>>>>>> unlink to
>>>>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock for 
>>>>>>>>>>>>>>> the GEMs
>>>>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>>>>> VM_BO list to be more precise). We can't just use the 
>>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>>> lock for that
>>>>>>>>>>>>>>> with VM_BO abstractions, because on destruction of a 
>>>>>>>>>>>>>>> VM_BO we
>>>>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's the 
>>>>>>>>>>>>>>> fix I
>>>>>>>>>>>>>>> was referring to
>>>>>>>>>>>>>>> earlier.
>>>>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the 
>>>>>>>>>>>>>> GEM's gpuva
>>>>>>>>>>>>>> list, but
>>>>>>>>>>>>>> holding the vm's dma-resv lock across the unlink 
>>>>>>>>>>>>>> shouldn't be a
>>>>>>>>>>>>>> problem. We
>>>>>>>>>>>>>> may free the object and a pointer to the vm's resv during 
>>>>>>>>>>>>>> unlink
>>>>>>>>>>>>>> but we
>>>>>>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring 
>>>>>>>>>>>>>> that any
>>>>>>>>>>>>>> calls to
>>>>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>>>>> Drivers calling unlink() from the fence signaling path 
>>>>>>>>>>>>> can't use the
>>>>>>>>>>>>> VM's
>>>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>>> Yes, that made me a bit curious because in the current 
>>>>>>>>>>>> version the code
>>>>>>>>>>>> required the object's dma_resv for unlink() which can't be 
>>>>>>>>>>>> grabbed
>>>>>>>>>>>> either from the fence signaling path. So are there any 
>>>>>>>>>>>> drivers actually
>>>>>>>>>>>> wanting to do that? If so, they will either need to resort 
>>>>>>>>>>>> to the
>>>>>>>>>>>> current spinlock solution or they will need to call unlink 
>>>>>>>>>>>> from a
>>>>>>>>>>>> workqueue item.
>>>>>>>>>>> As Boris already mentioned we have the dma-resv lock by 
>>>>>>>>>>> default or a driver
>>>>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of 
>>>>>>>>>>> the latter.
>>>>>>>>>>>
>>>>>>>>>>>>> Also, what if the object is an external object? We can't 
>>>>>>>>>>>>> use the VM's
>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>> lock here.
>>>>>>>>>>>> Why? Typically (sync) unlink is only ever called from an 
>>>>>>>>>>>> unbind-like
>>>>>>>>>>>> operation where it should be trivial to grab the vm's resv. 
>>>>>>>>>>>> Or, for
>>>>>>>>>>>> that matter any outer lock protecting the extobj list. Rule 
>>>>>>>>>>>> would be
>>>>>>>>>>>> the drm_gpuvm_bo::entry::extobj  and 
>>>>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>>>>> be protected by either the vm's dma_resv (or possibly an 
>>>>>>>>>>>> outer lock in
>>>>>>>>>>>> the case of the extobj list).
>>>>>>>>>>> Outer lock wouldn't have been working for updates in the 
>>>>>>>>>>> async path, but
>>>>>>>>>>> shouldn't be relevant anymore. We could use the VM's resv 
>>>>>>>>>>> for that.
>>>>>>>>>>>
>>>>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held when 
>>>>>>>>>>>>> calling
>>>>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which 
>>>>>>>>>>>>> if the
>>>>>>>>>>>>> refcount drops
>>>>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>>>>> drop the
>>>>>>>>>>>>> last reference of the GEM object.
>>>>>>>>>>>> Yes, but this is a different problem as to what exactly 
>>>>>>>>>>>> protects
>>>>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal 
>>>>>>>>>>>> per bo list
>>>>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need to 
>>>>>>>>>>>> ensure that
>>>>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts 
>>>>>>>>>>>> its obj
>>>>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>>>>> refcount (I know
>>>>>>>>>>>> Boris didn't like that, but requiring an explicit refcount 
>>>>>>>>>>>> for a
>>>>>>>>>>>> pointer you dereference unless you're under a lock that 
>>>>>>>>>>>> ensures keeping
>>>>>>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or 
>>>>>>>>>>>> internal spinlock)
>>>>>>>>>>>> I don't have a strong preference.
>>>>>>>>>>> We can keep the GEM objects dma-resv lock, however as 
>>>>>>>>>>> mentioned above
>>>>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both 
>>>>>>>>>>> the VM's resv lock
>>>>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>>>>
>>>>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>>>>> I don't think these are real problems.
>>>>>>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>>>>>>> currently have
>>>>>>>>>>>> slightly different approach to collect external bos needing 
>>>>>>>>>>>> rebinding,
>>>>>>>>>>>> we have this working fine.
>>>>>>>>>>>>
>>>>>>>>>>>> TBH I think pretty much the only situation where the 
>>>>>>>>>>>> spinlock is needed
>>>>>>>>>>>> is for async updates of these lists, unless a wq item can 
>>>>>>>>>>>> be used for
>>>>>>>>>>>> that, but it doesn't really seem like the current code 
>>>>>>>>>>>> allows for such
>>>>>>>>>>>> updates anyway? It complicates the code a lot, adds 
>>>>>>>>>>>> overhead and also
>>>>>>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>>>>>>
>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>
>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It seems that with that also the refcount could be make 
>>>>>>>>>>>>>>>> non-
>>>>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> All in the spirit of the drm locking guidelines "use 
>>>>>>>>>>>>>>>> big locks
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>> possible".
>>>>>>>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>>>>> {                                                        \ 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list 
>>>>>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., 
>>>>>>>>>>>>>>>>> vm_bo);
>>>>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Only used for internal list iterations, not meant 
>>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back to 
>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're iterating on
>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used to 
>>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we should 
>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>>>>> + * to restore the original state and let new 
>>>>>>>>>>>>>>>>> iterations take
>>>>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>>>>> +               /* Merge back the two lists, moving local
>>>>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>>>>> +                * head to preserve previous ordering, in
>>>>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the 
>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list specified by
>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the 
>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list specified by
>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>     #define 
>>>>>>>>>>>>>>>>> to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm 
>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct drm_gpuvm
>>>>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>>>>>>               "GPUVA tree is not empty, potentially 
>>>>>>>>>>>>>>>>> leaking
>>>>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), "Extobj 
>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict list
>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all 
>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Note: This function is safe against concurrent 
>>>>>>>>>>>>>>>>> insertion
>>>>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this 
>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the 
>>>>>>>>>>>>>>>>> GPUVM's
>>>>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>>>>>>>>> +                         unsigned int num_fences)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>> vm_bo->obj,
>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs mapped 
>>>>>>>>>>>>>>>>> within
>>>>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all &drm_gem_objects
>>>>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>>>>> +                       u64 addr, u64 range, unsigned int
>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, 
>>>>>>>>>>>>>>>>> end) {
>>>>>>>>>>>>>>>>> +               struct drm_gem_object *obj = va->gem.obj;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>>>>> + * being set the driver receives the given @fn 
>>>>>>>>>>>>>>>>> callback to
>>>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec 
>>>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_objects(gpuvm, 
>>>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec); 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, 
>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, 
>>>>>>>>>>>>>>>>> args-
>>>>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv of 
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>>>>> + * @num_objs: the number of additional 
>>>>>>>>>>>>>>>>> &drm_gem_objects to
>>>>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>>>>> + * contains mappings of, plus the ones given through 
>>>>>>>>>>>>>>>>> @objs.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>> +                         unsigned int num_objs,
>>>>>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>>>>>>>> +                         unsigned int num_fences,
>>>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, 
>>>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as 
>>>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for all
>>>>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, 
>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private 
>>>>>>>>>>>>>>>>> and all
>>>>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>>>>>>>>> +                        enum dma_resv_usage 
>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>> +                        enum dma_resv_usage 
>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, 
>>>>>>>>>>>>>>>>> obj) {
>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance 
>>>>>>>>>>>>>>>>> of struct
>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct drm_gpuvm
>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct kref 
>>>>>>>>>>>>>>>>> *kref)
>>>>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct kref 
>>>>>>>>>>>>>>>>> *kref)
>>>>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the 
>>>>>>>>>>>>>>>>> reference of
>>>>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * If the reference count drops to zero, the 
>>>>>>>>>>>>>>>>> &gpuvm_bo is
>>>>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. 
>>>>>>>>>>>>>>>>> Hence, if
>>>>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>>>>> + * function can potentially let the reference count 
>>>>>>>>>>>>>>>>> to zero
>>>>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct 
>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo 
>>>>>>>>>>>>>>>>> to its
>>>>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its 
>>>>>>>>>>>>>>>>> &drm_gpuvm's the
>>>>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj 
>>>>>>>>>>>>>>>>> list if
>>>>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>>>>> + * already and if the corresponding &drm_gem_object 
>>>>>>>>>>>>>>>>> is an
>>>>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a 
>>>>>>>>>>>>>>>>> &drm_gem_object to
>>>>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool 
>>>>>>>>>>>>>>>>> evict)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, 
>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>> extobj list
>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>> +        * @evict: structure holding the evict list 
>>>>>>>>>>>>>>>>> and evict
>>>>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm
>>>>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv 
>>>>>>>>>>>>>>>>> differs
>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct 
>>>>>>>>>>>>>>>>> drm_gpuva *va)
>>>>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, 
>>>>>>>>>>>>>>>>> gpuvm__)
>>>>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, 
>>>>>>>>>>>>>>>>> &(gpuvm__)-
>>>>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock 
>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA 
>>>>>>>>>>>>>>>>> reservations
>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>> +        * @extra: Callback and corresponding private 
>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>> +                         unsigned int num_fences);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>> +                * @priv: driver private data for the @fn
>>>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs common 
>>>>>>>>>>>>>>>>> dma-
>>>>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>> +                           struct drm_exec *exec,
>>>>>>>>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>> +                             u64 addr, u64 range,
>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>> +                             bool interruptible);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all 
>>>>>>>>>>>>>>>>> assoiciated
>>>>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>> +                             enum dma_resv_usage
>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool
>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk 
>>>>>>>>>>>>>>>>> over a
>>>>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op *op, 
>>>>>>>>>>>>>>>>> void
>>>>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>> +        * @bo_validate: called from drm_gpuvm_validate()
>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>> +        * Drivers receive this callback for every 
>>>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void 
>>>>>>>>>>>>>>>>> *priv,
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
Thomas Hellstrom Sept. 20, 2023, 12:06 p.m. UTC | #53
On 9/20/23 12:51, Christian König wrote:
> Am 20.09.23 um 09:44 schrieb Thomas Hellström:
>> Hi,
>>
>> On 9/20/23 07:37, Christian König wrote:
>>> Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>>>>
>>>> On 9/19/23 17:16, Danilo Krummrich wrote:
>>>>> On 9/19/23 14:21, Thomas Hellström wrote:
>>>>>> Hi Christian
>>>>>>
>>>>>> On 9/19/23 14:07, Christian König wrote:
>>>>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>>>>> As mentioned in a different mail thread, the reply is based 
>>>>>>>>>>>> on the assumption
>>>>>>>>>>>> that we don't support anything else than GPUVM updates from 
>>>>>>>>>>>> the IOCTL.
>>>>>>>>>>>
>>>>>>>>>>> I think that this assumption is incorrect.
>>>>>>>>>>
>>>>>>>>>> Well, more precisely I should have said "don't support GPUVM 
>>>>>>>>>> updated from within
>>>>>>>>>> fence signaling critical sections". And looking at the code, 
>>>>>>>>>> that doesn't seem what
>>>>>>>>>> you're doing there.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Vulkan is just once specific use case, but this here should 
>>>>>>>>>>> probably be able to handle other use cases as well.
>>>>>>>>>>>
>>>>>>>>>>> Especially with HMM you get the requirement that you need to 
>>>>>>>>>>> be able to invalidate GPUVM mappings without grabbing a 
>>>>>>>>>>> reservation lock.
>>>>>>>>>>
>>>>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>>>>> should only be called from a ttm_device_funcs::move 
>>>>>>>>>> callback, we should hold the dma-resv
>>>>>>>>>> lock there.
>>>>>>>>>
>>>>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>>>>
>>>>>>>>> In the move callback we only hold the dma-resv lock of the BO 
>>>>>>>>> which is moved, but when that is a shared BO then that's not 
>>>>>>>>> the same as the one for the VM.
>>>>>>>>
>>>>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>>>>> protect drm_gpuvm_bo::evicted
>>>>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted 
>>>>>>>> list once we grabbed all
>>>>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We can 
>>>>>>>> remove them from the evicted
>>>>>>>> list on validate(). This way we never touch the evicted list 
>>>>>>>> without holding at least the VM's
>>>>>>>> dma-resv lock.
>>>>>>>>
>>>>>>>> Do you have any concerns about that?
>>>>>>>
>>>>>>> Scratching my head a bit how that is supposed to work.
>>>>>>>
>>>>>>> This implies that you go over all the evicted BOs during 
>>>>>>> validation and not just the one mentioned in the CS.
>>>>>>>
>>>>>>> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>>>>
>>>>>>>>>> The eviction_lock seems to protect a VM state "evicting" of 
>>>>>>>>>> whether any BO that
>>>>>>>>>> is associated with the VM is currently evicting. At the same 
>>>>>>>>>> time amdgpu protects
>>>>>>>>>> the eviceted list of the VM with a different lock. So this 
>>>>>>>>>> seems to be entirely
>>>>>>>>>> unrelated. Tracking a "currently evicting" state is not part 
>>>>>>>>>> of the GPUVM
>>>>>>>>>> implementation currently and hence nothing would change for 
>>>>>>>>>> amdgpu there.
>>>>>>>>>
>>>>>>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>>>>>>
>>>>>>>>> The eviction lock and evicted state is for the VM page tables, 
>>>>>>>>> e.g. if the whole VM is currently not used and swapped out or 
>>>>>>>>> even de-allocated.
>>>>>>>>>
>>>>>>>>> This is necessary because we have cases where we need to 
>>>>>>>>> access the VM data without holding the dma-resv lock of this 
>>>>>>>>> VM. Especially figuring out which parts of an address space 
>>>>>>>>> contain mappings and which doesn't.
>>>>>>>>
>>>>>>>> I think this is fine, this has nothing to do with lists of 
>>>>>>>> evicted GEM objects or external GEM
>>>>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>>>>> the VA space does not require any dma-resv locks.
>>>>>>>
>>>>>>> I hope so, but I'm not 100% sure.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is a requirement which comes with HMM handling, you won't 
>>>>>>>>> see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The invalidation lock on the other hand is what in this 
>>>>>>>>> discussion is called eviction lock. This one is needed because 
>>>>>>>>> what I wrote above, during the move callback only the dma-resv 
>>>>>>>>> of the BO which is moved is locked, but not necessarily the 
>>>>>>>>> dma-resv of the VM.
>>>>>>>>
>>>>>>>> That's yet another thing, right? This is used to track whether 
>>>>>>>> *any* BO that belongs to the VM is
>>>>>>>> currently being evicted, correct? As mentioned, as by now this 
>>>>>>>> is not supported in GPUVM and hence
>>>>>>>> would be the same driver specific code with the same driver 
>>>>>>>> specifc lock.
>>>>>>>
>>>>>>> That is most likely a show stopper using this for OpenGL based 
>>>>>>> workloads as far as I can see. For those you need to able to 
>>>>>>> figure out which non-VM BOs have been evicted and which parts of 
>>>>>>> the VM needs updates.
>>>>>>
>>>>>> We identify those with a bool in the gpuvm_bo, and that bool is 
>>>>>> protected by the bo_resv. In essence, the "evicted" list must be 
>>>>>> made up-to-date with all relevant locks held before traversing in 
>>>>>> the next exec.
>>>>>
>>>>> What I still miss with this idea is how do we find all the 
>>>>> drm_gpuvm_bo structures with the evicted bool set to true? When 
>>>>> doing the drm_exec dance we come across all external ones and can 
>>>>> add them to the list if needed, but what about the BOs having the 
>>>>> VM's dma-resv?
>>>>
>>>> Oh, they can be added to the evict list directly (no bool needed) 
>>>> in the eviction code, like in v3. Since for those we indeed hold 
>>>> the VM's dma_resv since it's aliased with the object's dma-resv.
>>>
>>> Yeah, I wanted to note what Danilo seems to think about as well. How 
>>> do we figure out the non-VM BOs evicted?
>>>
>>> We can't walk over the list of all non-VM BOs on every submission, 
>>> that's to much overhead for cases with lots of non-VM BOs.
>>>
>>> And we can't rely on userspace sending all non-VM BOs as used list 
>>> down to the kernel with each submission.
>>>
>>> Regards,
>>> Christian.
>>
>> No, that's not needed: Mechanism below.
>>
>> 1) We maintain an evicted list. Typically protected by the vm resv.
>> 2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.
>>
>> a) Evicting a vm bo: The vm resv is held by the eviction code. Just 
>> put it on the evicted list.
>> b) Evicting a shared/external bo: The bo resv is held by the eviction 
>> code. Set the "evicted" bool
>> c) Validating the evicted list on exec:
>
>
>> Loop through all *external/shared* bos.
>
> And this is what you can't do. For Vulkan it probably doesn't matter, 
> but for OpenGL and especially multimedia we have much more BOs on the 
> shared list than what's allocated for the VM.

But you need to lock- and fence all those so you need to loop through 
them anyway, so we're still O(n_shared)? Or is there some clever 
optimization in amdgpu?

I think with some UMDs, xe might end up with similar large lists...

/Thomas


>
> Regards,
> Christian.
>
>> Lock them. After locking, check the "evicted" bool, if it's true. put 
>> the bo on the evicted list (we hold the VM resv at this point) and 
>> clear the "evicted" bool. Note that other vms will have their own 
>> gpuvm_bo which is marked evicted.
>>
>> I have this coded up in a patch for Xe and it seems to be working 
>> properly.
>>
>> /Thomas
>>
>>
>>>
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> If you mean that we need to unbind all vmas of all vms of evicted 
>>>>>> bos before evicting, We don't do that, at least not in Xe, since 
>>>>>> evicting we wait for VM idle, and it cant access anything through 
>>>>>> the stale vmas until they have been revalidated and rebound.
>>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas 
>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas 
>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common 
>>>>>>>>>>>>>>>>>> infrastructure to
>>>>>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>>>>>> allocations and mappings, generically connect GPU VA 
>>>>>>>>>>>>>>>>>> mappings
>>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>>> backing buffers and perform more complex mapping 
>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> However, there are more design patterns commonly used by
>>>>>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>>>>>> can potentially be generalized in order to make the 
>>>>>>>>>>>>>>>>>> DRM GPUVA
>>>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this 
>>>>>>>>>>>>>>>>>> context,
>>>>>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not 
>>>>>>>>>>>>>>>>>> being used
>>>>>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM objects
>>>>>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM 
>>>>>>>>>>>>>>>>>> objects dma-
>>>>>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>>>>>       of, such that validation of evicted GEM objects is
>>>>>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 5) Provide some convinience functions for common 
>>>>>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Rather than being designed as a "framework", the 
>>>>>>>>>>>>>>>>>> target is to
>>>>>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>>>>>> features appear as a collection of optional helper 
>>>>>>>>>>>>>>>>>> functions,
>>>>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA 
>>>>>>>>>>>>>>>>>> managers basic
>>>>>>>>>>>>>>>>>> functionality and opt-in for other features without 
>>>>>>>>>>>>>>>>>> setting
>>>>>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>>>>>> flags, just by making use of the corresponding 
>>>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>>>>>> updating the GPU VA space within the fence signalling 
>>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>     drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>>>>     include/drm/drm_gpuvm.h | 197 ++++++++++++++
>>>>>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an 
>>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>>>>>      * particular combination. If not existent a new 
>>>>>>>>>>>>>>>>>> instance
>>>>>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>>>>>> + * list are maintained in order to accelerate 
>>>>>>>>>>>>>>>>>> locking of
>>>>>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>>>>>> + * validation of evicted objects bound in a 
>>>>>>>>>>>>>>>>>> &drm_gpuvm. For
>>>>>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given &drm_gpuvm 
>>>>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. 
>>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the 
>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as 
>>>>>>>>>>>>>>>>>> external object
>>>>>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for 
>>>>>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances 
>>>>>>>>>>>>>>>>>> unique.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of 
>>>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>>>>>> + * protected against concurrent insertion / removal and
>>>>>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * However, drivers still need ensure to protect 
>>>>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>>>>>> + * iterating those lists, such as 
>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() and
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function 
>>>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Functions adding or removing entries from those 
>>>>>>>>>>>>>>>>>> lists,
>>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() 
>>>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>>>>>> + * (safely) modified while potentially being 
>>>>>>>>>>>>>>>>>> iternated by
>>>>>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo 
>>>>>>>>>>>>>>>>>> element
>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used 
>>>>>>>>>>>>>>>>>> to store
>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>> Are the list spinlocks needed for that async state 
>>>>>>>>>>>>>>>>> update from
>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Otherwise it should be sufficient to protect the lists 
>>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If those spinlocks are still needed in some 
>>>>>>>>>>>>>>>>> situations, perhaps
>>>>>>>>>>>>>>>>> could we
>>>>>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the 
>>>>>>>>>>>>>>>>> maple tree
>>>>>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>>>>>> holding only the dma-resv lock from the BO this 
>>>>>>>>>>>>>>>> function gets
>>>>>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() 
>>>>>>>>>>>>>>>> calls with
>>>>>>>>>>>>>>>> different BOs.
>>>>>>>>>>>>>>> No. Only if you try to add external objects to the vm's 
>>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> within the evict code. That's not necessary since you 
>>>>>>>>>>>>>>> loop through
>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>> external objects anyway when locking them so an 
>>>>>>>>>>>>>>> "evicted" bool in
>>>>>>>>>>>>>>> the vm_bo,
>>>>>>>>>>>>>>> protected by the bo resv would be sufficient. The extobj 
>>>>>>>>>>>>>>> locking
>>>>>>>>>>>>>>> loop can
>>>>>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>>>>>> And validate() can remove it while still holding all 
>>>>>>>>>>>>>> dma-resv locks,
>>>>>>>>>>>>>> neat!
>>>>>>>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>>>>>>>> concurrently? What
>>>>>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>>>>>> Are we guaranteed that at this point of time the 
>>>>>>>>>>>>>> drm_gpuvm_bo is not
>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>> might drop the last reference to the drm_gem_object and 
>>>>>>>>>>>>>> hence we'd
>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>> free the dma-resv lock while holding it, at least if it's 
>>>>>>>>>>>>>> an external
>>>>>>>>>>>>>> object.
>>>>>>>>>>>>> Easiest way in this scheme is to think of the lists as 
>>>>>>>>>>>>> being protected
>>>>>>>>>>>>> by the vm's resv lock. That means anybody calling unlink() 
>>>>>>>>>>>>> must also
>>>>>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of 
>>>>>>>>>>>>> view, but
>>>>>>>>>>>>> perhaps not from a locking inversion POW from an async 
>>>>>>>>>>>>> list update).
>>>>>>>>>>>> This would mean that on unlink() we'd need to hold the VM's 
>>>>>>>>>>>> resv lock and the
>>>>>>>>>>>> corresponding GEM's resv lock (in case they're not the same 
>>>>>>>>>>>> anyways) because the
>>>>>>>>>>>> VM's resv lock would protect the external / evicted object 
>>>>>>>>>>>> lists and the GEM
>>>>>>>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos 
>>>>>>>>>>>> and the
>>>>>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For extobjs an outer lock would be enough in case of 
>>>>>>>>>>>>>>>> Xe, but I
>>>>>>>>>>>>>>>> really would not
>>>>>>>>>>>>>>>> like to add even more complexity just to get the 
>>>>>>>>>>>>>>>> spinlock out of
>>>>>>>>>>>>>>>> the way in case
>>>>>>>>>>>>>>>> the driver already has an outer lock protecting this path.
>>>>>>>>>>>>>>> I must disagree here. These spinlocks and atomic 
>>>>>>>>>>>>>>> operations are
>>>>>>>>>>>>>>> pretty
>>>>>>>>>>>>>>> costly and as discussed earlier this type of locking was 
>>>>>>>>>>>>>>> the reason
>>>>>>>>>>>>>>> (at
>>>>>>>>>>>>>>> least according to the commit message) that made 
>>>>>>>>>>>>>>> Christian drop the
>>>>>>>>>>>>>>> XArray
>>>>>>>>>>>>>>> use in drm_exec for the same set of objects: "The 
>>>>>>>>>>>>>>> locking overhead
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>>>>>>>>> complexity and a
>>>>>>>>>>>>>>> single wide lock following the drm locking guidelines 
>>>>>>>>>>>>>>> set out by
>>>>>>>>>>>>>>> Daniel and
>>>>>>>>>>>>>>> David should really be the default choice with an opt-in 
>>>>>>>>>>>>>>> for a
>>>>>>>>>>>>>>> spinlock if
>>>>>>>>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>>>>>>>>> For the external object list an outer lock would work as 
>>>>>>>>>>>>>> long as it's
>>>>>>>>>>>>>> not the
>>>>>>>>>>>>>> dma-resv lock of the corresponding GEM object, since here 
>>>>>>>>>>>>>> we actually
>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>>>>>> It's just a bit weird design wise that drivers would need 
>>>>>>>>>>>>>> to take
>>>>>>>>>>>>>> this outer
>>>>>>>>>>>>>> lock on:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Given that it seems reasonable to do all the required 
>>>>>>>>>>>>>> locking
>>>>>>>>>>>>>> internally.
>>>>>>>>>>>>>  From a design POW, there has been a clear direction in XE 
>>>>>>>>>>>>> to make
>>>>>>>>>>>>> things similar to mmap() / munmap(), so this outer lock, 
>>>>>>>>>>>>> which in Xe is
>>>>>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's 
>>>>>>>>>>>>> protecting
>>>>>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>>>>>> structures and
>>>>>>>>>>>>> the extobj list. Basically it's taken early in the exec 
>>>>>>>>>>>>> IOCTL, the
>>>>>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault 
>>>>>>>>>>>>> handler, so
>>>>>>>>>>>>> all of the above are just asserting that it is taken in 
>>>>>>>>>>>>> the correct
>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But strictly with this scheme one could also use the vm's 
>>>>>>>>>>>>> dma_resv for
>>>>>>>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>>>>>>>> traversing the
>>>>>>>>>>>>> list.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The whole point of this scheme is to rely on locks that 
>>>>>>>>>>>>> you already are
>>>>>>>>>>>>> supposed to be holding for various reasons and is simple 
>>>>>>>>>>>>> to comprehend.
>>>>>>>>>>>> I don't agree that we're supposed to hold the VM's resv 
>>>>>>>>>>>> lock anyways for
>>>>>>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), 
>>>>>>>>>>>> but I'm fine using it
>>>>>>>>>>>> for that purpose nevertheless.
>>>>>>>>>>>>
>>>>>>>>>>>>>> In order to at least place lockdep checks, the driver 
>>>>>>>>>>>>>> would need to
>>>>>>>>>>>>>> supply the
>>>>>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM 
>>>>>>>>>>>>>> otherwise doesn't
>>>>>>>>>>>>>> know about
>>>>>>>>>>>>>> the lock.
>>>>>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>>>>>> I'd really like to avoid that, especially now that 
>>>>>>>>>>>> everything got simpler. We
>>>>>>>>>>>> should define the actual locks to take instead.
>>>>>>>>>>>>
>>>>>>>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() 
>>>>>>>>>>>>>> that doesn't
>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>> spin?
>>>>>>>>>>>>> I guess it's hard to tell exactly, but it is much lower on 
>>>>>>>>>>>>> modern x86
>>>>>>>>>>>>> than what it used to be. Not sure about ARM, which is the 
>>>>>>>>>>>>> other
>>>>>>>>>>>>> architecture important to us. I figure if there is little 
>>>>>>>>>>>>> cache-line
>>>>>>>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm 
>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>> spinlock_t
>>>>>>>>>>>>>>> *lock)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For such drivers, that would require anybody calling 
>>>>>>>>>>>>>>>>> unlink to
>>>>>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock for 
>>>>>>>>>>>>>>>> the GEMs
>>>>>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>>>>>> VM_BO list to be more precise). We can't just use the 
>>>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>>>> lock for that
>>>>>>>>>>>>>>>> with VM_BO abstractions, because on destruction of a 
>>>>>>>>>>>>>>>> VM_BO we
>>>>>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's 
>>>>>>>>>>>>>>>> the fix I
>>>>>>>>>>>>>>>> was referring to
>>>>>>>>>>>>>>>> earlier.
>>>>>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the 
>>>>>>>>>>>>>>> GEM's gpuva
>>>>>>>>>>>>>>> list, but
>>>>>>>>>>>>>>> holding the vm's dma-resv lock across the unlink 
>>>>>>>>>>>>>>> shouldn't be a
>>>>>>>>>>>>>>> problem. We
>>>>>>>>>>>>>>> may free the object and a pointer to the vm's resv 
>>>>>>>>>>>>>>> during unlink
>>>>>>>>>>>>>>> but we
>>>>>>>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring 
>>>>>>>>>>>>>>> that any
>>>>>>>>>>>>>>> calls to
>>>>>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>>>>>> Drivers calling unlink() from the fence signaling path 
>>>>>>>>>>>>>> can't use the
>>>>>>>>>>>>>> VM's
>>>>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>>>> Yes, that made me a bit curious because in the current 
>>>>>>>>>>>>> version the code
>>>>>>>>>>>>> required the object's dma_resv for unlink() which can't be 
>>>>>>>>>>>>> grabbed
>>>>>>>>>>>>> either from the fence signaling path. So are there any 
>>>>>>>>>>>>> drivers actually
>>>>>>>>>>>>> wanting to do that? If so, they will either need to resort 
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> current spinlock solution or they will need to call unlink 
>>>>>>>>>>>>> from a
>>>>>>>>>>>>> workqueue item.
>>>>>>>>>>>> As Boris already mentioned we have the dma-resv lock by 
>>>>>>>>>>>> default or a driver
>>>>>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of 
>>>>>>>>>>>> the latter.
>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, what if the object is an external object? We can't 
>>>>>>>>>>>>>> use the VM's
>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>> lock here.
>>>>>>>>>>>>> Why? Typically (sync) unlink is only ever called from an 
>>>>>>>>>>>>> unbind-like
>>>>>>>>>>>>> operation where it should be trivial to grab the vm's 
>>>>>>>>>>>>> resv. Or, for
>>>>>>>>>>>>> that matter any outer lock protecting the extobj list. 
>>>>>>>>>>>>> Rule would be
>>>>>>>>>>>>> the drm_gpuvm_bo::entry::extobj  and 
>>>>>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>>>>>> be protected by either the vm's dma_resv (or possibly an 
>>>>>>>>>>>>> outer lock in
>>>>>>>>>>>>> the case of the extobj list).
>>>>>>>>>>>> Outer lock wouldn't have been working for updates in the 
>>>>>>>>>>>> async path, but
>>>>>>>>>>>> shouldn't be relevant anymore. We could use the VM's resv 
>>>>>>>>>>>> for that.
>>>>>>>>>>>>
>>>>>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held when 
>>>>>>>>>>>>>> calling
>>>>>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which 
>>>>>>>>>>>>>> if the
>>>>>>>>>>>>>> refcount drops
>>>>>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>>>>>> drop the
>>>>>>>>>>>>>> last reference of the GEM object.
>>>>>>>>>>>>> Yes, but this is a different problem as to what exactly 
>>>>>>>>>>>>> protects
>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an 
>>>>>>>>>>>>> internal per bo list
>>>>>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need to 
>>>>>>>>>>>>> ensure that
>>>>>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually 
>>>>>>>>>>>>> refcounts its obj
>>>>>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>>>>>> refcount (I know
>>>>>>>>>>>>> Boris didn't like that, but requiring an explicit refcount 
>>>>>>>>>>>>> for a
>>>>>>>>>>>>> pointer you dereference unless you're under a lock that 
>>>>>>>>>>>>> ensures keeping
>>>>>>>>>>>>> the object alive is pretty much required?) But anyway for the
>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or 
>>>>>>>>>>>>> internal spinlock)
>>>>>>>>>>>>> I don't have a strong preference.
>>>>>>>>>>>> We can keep the GEM objects dma-resv lock, however as 
>>>>>>>>>>>> mentioned above
>>>>>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires 
>>>>>>>>>>>> both the VM's resv lock
>>>>>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>>>>>
>>>>>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>>>>>> I don't think these are real problems.
>>>>>>>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>>>>>>>> currently have
>>>>>>>>>>>>> slightly different approach to collect external bos 
>>>>>>>>>>>>> needing rebinding,
>>>>>>>>>>>>> we have this working fine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> TBH I think pretty much the only situation where the 
>>>>>>>>>>>>> spinlock is needed
>>>>>>>>>>>>> is for async updates of these lists, unless a wq item can 
>>>>>>>>>>>>> be used for
>>>>>>>>>>>>> that, but it doesn't really seem like the current code 
>>>>>>>>>>>>> allows for such
>>>>>>>>>>>>> updates anyway? It complicates the code a lot, adds 
>>>>>>>>>>>>> overhead and also
>>>>>>>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>>>>>>>
>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It seems that with that also the refcount could be 
>>>>>>>>>>>>>>>>> make non-
>>>>>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> All in the spirit of the drm locking guidelines "use 
>>>>>>>>>>>>>>>>> big locks
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>> possible".
>>>>>>>>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Elements popped from the original list are kept in a
>>>>>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>>>>>> {                                                        \ 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list 
>>>>>>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., 
>>>>>>>>>>>>>>>>>> vm_bo);
>>>>>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>,
>>>>>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Only used for internal list iterations, not meant 
>>>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>>>>>> +       for (__vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>>>>>> +            __vm_bo = get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back 
>>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used 
>>>>>>>>>>>>>>>>>> to store
>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we should 
>>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>>>>>> + * to restore the original state and let new 
>>>>>>>>>>>>>>>>>> iterations take
>>>>>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>>>>>> +               /* Merge back the two lists, moving 
>>>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>>>>>> +                * head to preserve previous 
>>>>>>>>>>>>>>>>>> ordering, in
>>>>>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into the 
>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list 
>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from the 
>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list 
>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>     #define 
>>>>>>>>>>>>>>>>>> to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, range);
>>>>>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct 
>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>>>>>>>               "GPUVA tree is not empty, potentially 
>>>>>>>>>>>>>>>>>> leaking
>>>>>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), 
>>>>>>>>>>>>>>>>>> "Extobj list
>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict 
>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all 
>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Note: This function is safe against concurrent 
>>>>>>>>>>>>>>>>>> insertion
>>>>>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this 
>>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the 
>>>>>>>>>>>>>>>>>> GPUVM's
>>>>>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>> +                         struct drm_exec *exec,
>>>>>>>>>>>>>>>>>> + unsigned int num_fences)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>> vm_bo->obj,
>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>> mapped within
>>>>>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>>>>>> +                       u64 addr, u64 range, unsigned 
>>>>>>>>>>>>>>>>>> int
>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, 
>>>>>>>>>>>>>>>>>> end) {
>>>>>>>>>>>>>>>>>> +               struct drm_gem_object *obj = 
>>>>>>>>>>>>>>>>>> va->gem.obj;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>>>>>> + * being set the driver receives the given @fn 
>>>>>>>>>>>>>>>>>> callback to
>>>>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +               ret = 
>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>>>>>> +                       ret = vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec); 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, 
>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       return drm_exec_prepare_array(&vm_exec->exec, 
>>>>>>>>>>>>>>>>>> args-
>>>>>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv 
>>>>>>>>>>>>>>>>>> of all
>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>>>>>> + * @num_objs: the number of additional 
>>>>>>>>>>>>>>>>>> &drm_gem_objects to
>>>>>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>>>>>> + * contains mappings of, plus the ones given through 
>>>>>>>>>>>>>>>>>> @objs.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>> +                         struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs mapped
>>>>>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, 
>>>>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked as 
>>>>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback for 
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, 
>>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the loop. */
>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private 
>>>>>>>>>>>>>>>>>> and all
>>>>>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>> +                        struct drm_exec *exec,
>>>>>>>>>>>>>>>>>> +                        struct dma_fence *fence,
>>>>>>>>>>>>>>>>>> +                        enum dma_resv_usage 
>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>> +                        enum dma_resv_usage 
>>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, 
>>>>>>>>>>>>>>>>>> obj) {
>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance 
>>>>>>>>>>>>>>>>>> of struct
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct 
>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the 
>>>>>>>>>>>>>>>>>> reference of
>>>>>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * If the reference count drops to zero, the 
>>>>>>>>>>>>>>>>>> &gpuvm_bo is
>>>>>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. 
>>>>>>>>>>>>>>>>>> Hence, if
>>>>>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>>>>>> + * function can potentially let the reference count 
>>>>>>>>>>>>>>>>>> to zero
>>>>>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva lock.
>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct 
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ 
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the 
>>>>>>>>>>>>>>>>>> &drm_gpuvm_bo to its
>>>>>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its 
>>>>>>>>>>>>>>>>>> &drm_gpuvm's the
>>>>>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj 
>>>>>>>>>>>>>>>>>> list if
>>>>>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>>>>>> + * already and if the corresponding &drm_gem_object 
>>>>>>>>>>>>>>>>>> is an
>>>>>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a 
>>>>>>>>>>>>>>>>>> &drm_gem_object to
>>>>>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool 
>>>>>>>>>>>>>>>>>> evict)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, 
>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>> extobj list
>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>> +        * @evict: structure holding the evict list 
>>>>>>>>>>>>>>>>>> and evict
>>>>>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct 
>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv 
>>>>>>>>>>>>>>>>>> differs
>>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm
>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct 
>>>>>>>>>>>>>>>>>> drm_gpuva *va)
>>>>>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, 
>>>>>>>>>>>>>>>>>> gpuvm__)
>>>>>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, 
>>>>>>>>>>>>>>>>>> &(gpuvm__)-
>>>>>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock 
>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA 
>>>>>>>>>>>>>>>>>> reservations
>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>> +        * @extra: Callback and corresponding private 
>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>> +                * @priv: driver private data for the 
>>>>>>>>>>>>>>>>>> @fn
>>>>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs 
>>>>>>>>>>>>>>>>>> common dma-
>>>>>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, &gpuvm->d_obj,
>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>> +                           u64 addr, u64 range,
>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all 
>>>>>>>>>>>>>>>>>> assoiciated
>>>>>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, 
>>>>>>>>>>>>>>>>>> bool
>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk 
>>>>>>>>>>>>>>>>>> over a
>>>>>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op 
>>>>>>>>>>>>>>>>>> *op, void
>>>>>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>> +        * @bo_validate: called from 
>>>>>>>>>>>>>>>>>> drm_gpuvm_validate()
>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>> +        * Drivers receive this callback for every 
>>>>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>> void *priv,
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>
Christian König Sept. 20, 2023, 1:06 p.m. UTC | #54
Am 20.09.23 um 14:06 schrieb Thomas Hellström:
>
> On 9/20/23 12:51, Christian König wrote:
>> Am 20.09.23 um 09:44 schrieb Thomas Hellström:
>>> Hi,
>>>
>>> On 9/20/23 07:37, Christian König wrote:
>>>> Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>>>>>
>>>>> On 9/19/23 17:16, Danilo Krummrich wrote:
>>>>>> On 9/19/23 14:21, Thomas Hellström wrote:
>>>>>>> Hi Christian
>>>>>>>
>>>>>>> On 9/19/23 14:07, Christian König wrote:
>>>>>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>>>>>> As mentioned in a different mail thread, the reply is 
>>>>>>>>>>>>> based on the assumption
>>>>>>>>>>>>> that we don't support anything else than GPUVM updates 
>>>>>>>>>>>>> from the IOCTL.
>>>>>>>>>>>>
>>>>>>>>>>>> I think that this assumption is incorrect.
>>>>>>>>>>>
>>>>>>>>>>> Well, more precisely I should have said "don't support GPUVM 
>>>>>>>>>>> updated from within
>>>>>>>>>>> fence signaling critical sections". And looking at the code, 
>>>>>>>>>>> that doesn't seem what
>>>>>>>>>>> you're doing there.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Vulkan is just once specific use case, but this here should 
>>>>>>>>>>>> probably be able to handle other use cases as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Especially with HMM you get the requirement that you need 
>>>>>>>>>>>> to be able to invalidate GPUVM mappings without grabbing a 
>>>>>>>>>>>> reservation lock.
>>>>>>>>>>>
>>>>>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>>>>>> should only be called from a ttm_device_funcs::move 
>>>>>>>>>>> callback, we should hold the dma-resv
>>>>>>>>>>> lock there.
>>>>>>>>>>
>>>>>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>>>>>
>>>>>>>>>> In the move callback we only hold the dma-resv lock of the BO 
>>>>>>>>>> which is moved, but when that is a shared BO then that's not 
>>>>>>>>>> the same as the one for the VM.
>>>>>>>>>
>>>>>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>>>>>> protect drm_gpuvm_bo::evicted
>>>>>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted 
>>>>>>>>> list once we grabbed all
>>>>>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We 
>>>>>>>>> can remove them from the evicted
>>>>>>>>> list on validate(). This way we never touch the evicted list 
>>>>>>>>> without holding at least the VM's
>>>>>>>>> dma-resv lock.
>>>>>>>>>
>>>>>>>>> Do you have any concerns about that?
>>>>>>>>
>>>>>>>> Scratching my head a bit how that is supposed to work.
>>>>>>>>
>>>>>>>> This implies that you go over all the evicted BOs during 
>>>>>>>> validation and not just the one mentioned in the CS.
>>>>>>>>
>>>>>>>> That might work for Vulkan, but is pretty much a no-go for OpenGL.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>>>>>
>>>>>>>>>>> The eviction_lock seems to protect a VM state "evicting" of 
>>>>>>>>>>> whether any BO that
>>>>>>>>>>> is associated with the VM is currently evicting. At the same 
>>>>>>>>>>> time amdgpu protects
>>>>>>>>>>> the eviceted list of the VM with a different lock. So this 
>>>>>>>>>>> seems to be entirely
>>>>>>>>>>> unrelated. Tracking a "currently evicting" state is not part 
>>>>>>>>>>> of the GPUVM
>>>>>>>>>>> implementation currently and hence nothing would change for 
>>>>>>>>>>> amdgpu there.
>>>>>>>>>>
>>>>>>>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>>>>>>>
>>>>>>>>>> The eviction lock and evicted state is for the VM page 
>>>>>>>>>> tables, e.g. if the whole VM is currently not used and 
>>>>>>>>>> swapped out or even de-allocated.
>>>>>>>>>>
>>>>>>>>>> This is necessary because we have cases where we need to 
>>>>>>>>>> access the VM data without holding the dma-resv lock of this 
>>>>>>>>>> VM. Especially figuring out which parts of an address space 
>>>>>>>>>> contain mappings and which doesn't.
>>>>>>>>>
>>>>>>>>> I think this is fine, this has nothing to do with lists of 
>>>>>>>>> evicted GEM objects or external GEM
>>>>>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>>>>>> the VA space does not require any dma-resv locks.
>>>>>>>>
>>>>>>>> I hope so, but I'm not 100% sure.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is a requirement which comes with HMM handling, you 
>>>>>>>>>> won't see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The invalidation lock on the other hand is what in this 
>>>>>>>>>> discussion is called eviction lock. This one is needed 
>>>>>>>>>> because what I wrote above, during the move callback only the 
>>>>>>>>>> dma-resv of the BO which is moved is locked, but not 
>>>>>>>>>> necessarily the dma-resv of the VM.
>>>>>>>>>
>>>>>>>>> That's yet another thing, right? This is used to track whether 
>>>>>>>>> *any* BO that belongs to the VM is
>>>>>>>>> currently being evicted, correct? As mentioned, as by now this 
>>>>>>>>> is not supported in GPUVM and hence
>>>>>>>>> would be the same driver specific code with the same driver 
>>>>>>>>> specifc lock.
>>>>>>>>
>>>>>>>> That is most likely a show stopper using this for OpenGL based 
>>>>>>>> workloads as far as I can see. For those you need to able to 
>>>>>>>> figure out which non-VM BOs have been evicted and which parts 
>>>>>>>> of the VM needs updates.
>>>>>>>
>>>>>>> We identify those with a bool in the gpuvm_bo, and that bool is 
>>>>>>> protected by the bo_resv. In essence, the "evicted" list must be 
>>>>>>> made up-to-date with all relevant locks held before traversing 
>>>>>>> in the next exec.
>>>>>>
>>>>>> What I still miss with this idea is how do we find all the 
>>>>>> drm_gpuvm_bo structures with the evicted bool set to true? When 
>>>>>> doing the drm_exec dance we come across all external ones and can 
>>>>>> add them to the list if needed, but what about the BOs having the 
>>>>>> VM's dma-resv?
>>>>>
>>>>> Oh, they can be added to the evict list directly (no bool needed) 
>>>>> in the eviction code, like in v3. Since for those we indeed hold 
>>>>> the VM's dma_resv since it's aliased with the object's dma-resv.
>>>>
>>>> Yeah, I wanted to note what Danilo seems to think about as well. 
>>>> How do we figure out the non-VM BOs evicted?
>>>>
>>>> We can't walk over the list of all non-VM BOs on every submission, 
>>>> that's to much overhead for cases with lots of non-VM BOs.
>>>>
>>>> And we can't rely on userspace sending all non-VM BOs as used list 
>>>> down to the kernel with each submission.
>>>>
>>>> Regards,
>>>> Christian.
>>>
>>> No, that's not needed: Mechanism below.
>>>
>>> 1) We maintain an evicted list. Typically protected by the vm resv.
>>> 2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.
>>>
>>> a) Evicting a vm bo: The vm resv is held by the eviction code. Just 
>>> put it on the evicted list.
>>> b) Evicting a shared/external bo: The bo resv is held by the 
>>> eviction code. Set the "evicted" bool
>>> c) Validating the evicted list on exec:
>>
>>
>>> Loop through all *external/shared* bos.
>>
>> And this is what you can't do. For Vulkan it probably doesn't matter, 
>> but for OpenGL and especially multimedia we have much more BOs on the 
>> shared list than what's allocated for the VM.
>
> But you need to lock- and fence all those so you need to loop through 
> them anyway, so we're still O(n_shared)? Or is there some clever 
> optimization in amdgpu?

Why should I lock and fence them? Only the BOs in the relocation list 
are locked and fenced.

Regards,
Christian.

>
> I think with some UMDs, xe might end up with similar large lists...
>
> /Thomas
>
>
>>
>> Regards,
>> Christian.
>>
>>> Lock them. After locking, check the "evicted" bool, if it's true. 
>>> put the bo on the evicted list (we hold the VM resv at this point) 
>>> and clear the "evicted" bool. Note that other vms will have their 
>>> own gpuvm_bo which is marked evicted.
>>>
>>> I have this coded up in a patch for Xe and it seems to be working 
>>> properly.
>>>
>>> /Thomas
>>>
>>>
>>>>
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> If you mean that we need to unbind all vmas of all vms of 
>>>>>>> evicted bos before evicting, We don't do that, at least not in 
>>>>>>> Xe, since evicting we wait for VM idle, and it cant access 
>>>>>>> anything through the stale vmas until they have been revalidated 
>>>>>>> and rebound.
>>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas 
>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas 
>>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common 
>>>>>>>>>>>>>>>>>>> infrastructure to
>>>>>>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>>>>>>> allocations and mappings, generically connect GPU VA 
>>>>>>>>>>>>>>>>>>> mappings
>>>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>>>> backing buffers and perform more complex mapping 
>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> However, there are more design patterns commonly 
>>>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>>>>>>> can potentially be generalized in order to make the 
>>>>>>>>>>>>>>>>>>> DRM GPUVA
>>>>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this 
>>>>>>>>>>>>>>>>>>> context,
>>>>>>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not 
>>>>>>>>>>>>>>>>>>> being used
>>>>>>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM 
>>>>>>>>>>>>>>>>>>> objects
>>>>>>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM 
>>>>>>>>>>>>>>>>>>> objects dma-
>>>>>>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>>>>>>       of, such that validation of evicted GEM 
>>>>>>>>>>>>>>>>>>> objects is
>>>>>>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 5) Provide some convinience functions for common 
>>>>>>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Rather than being designed as a "framework", the 
>>>>>>>>>>>>>>>>>>> target is to
>>>>>>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>>>>>>> features appear as a collection of optional helper 
>>>>>>>>>>>>>>>>>>> functions,
>>>>>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA 
>>>>>>>>>>>>>>>>>>> managers basic
>>>>>>>>>>>>>>>>>>> functionality and opt-in for other features without 
>>>>>>>>>>>>>>>>>>> setting
>>>>>>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>>>>>>> flags, just by making use of the corresponding 
>>>>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure out
>>>>>>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>>>>>>> updating the GPU VA space within the fence 
>>>>>>>>>>>>>>>>>>> signalling path.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>>>>>     include/drm/drm_gpuvm.h | 197 ++++++++++++++
>>>>>>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for an 
>>>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>>>>>>      * particular combination. If not existent a new 
>>>>>>>>>>>>>>>>>>> instance
>>>>>>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external and
>>>>>>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>>>>>>> + * list are maintained in order to accelerate 
>>>>>>>>>>>>>>>>>>> locking of
>>>>>>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>>>>>>> + * validation of evicted objects bound in a 
>>>>>>>>>>>>>>>>>>> &drm_gpuvm. For
>>>>>>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given 
>>>>>>>>>>>>>>>>>>> &drm_gpuvm can be
>>>>>>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can call
>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. 
>>>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the 
>>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as 
>>>>>>>>>>>>>>>>>>> external object
>>>>>>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's common
>>>>>>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() for 
>>>>>>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe previous
>>>>>>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances 
>>>>>>>>>>>>>>>>>>> unique.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of 
>>>>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>>>>>>> + * protected against concurrent insertion / removal 
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * However, drivers still need ensure to protect 
>>>>>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>>>>>>> + * iterating those lists, such as 
>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() and
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such function 
>>>>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Functions adding or removing entries from those 
>>>>>>>>>>>>>>>>>>> lists,
>>>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or 
>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>>>>>>> + * (safely) modified while potentially being 
>>>>>>>>>>>>>>>>>>> iternated by
>>>>>>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo 
>>>>>>>>>>>>>>>>>>> element
>>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used 
>>>>>>>>>>>>>>>>>>> to store
>>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>> Are the list spinlocks needed for that async state 
>>>>>>>>>>>>>>>>>> update from
>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Otherwise it should be sufficient to protect the 
>>>>>>>>>>>>>>>>>> lists with the
>>>>>>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If those spinlocks are still needed in some 
>>>>>>>>>>>>>>>>>> situations, perhaps
>>>>>>>>>>>>>>>>>> could we
>>>>>>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the 
>>>>>>>>>>>>>>>>>> maple tree
>>>>>>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>>>>>>> holding only the dma-resv lock from the BO this 
>>>>>>>>>>>>>>>>> function gets
>>>>>>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() 
>>>>>>>>>>>>>>>>> calls with
>>>>>>>>>>>>>>>>> different BOs.
>>>>>>>>>>>>>>>> No. Only if you try to add external objects to the vm's 
>>>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> within the evict code. That's not necessary since you 
>>>>>>>>>>>>>>>> loop through
>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>> external objects anyway when locking them so an 
>>>>>>>>>>>>>>>> "evicted" bool in
>>>>>>>>>>>>>>>> the vm_bo,
>>>>>>>>>>>>>>>> protected by the bo resv would be sufficient. The 
>>>>>>>>>>>>>>>> extobj locking
>>>>>>>>>>>>>>>> loop can
>>>>>>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>>>>>>> And validate() can remove it while still holding all 
>>>>>>>>>>>>>>> dma-resv locks,
>>>>>>>>>>>>>>> neat!
>>>>>>>>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>>>>>>>>> concurrently? What
>>>>>>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>>>>>>> Are we guaranteed that at this point of time the 
>>>>>>>>>>>>>>> drm_gpuvm_bo is not
>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>>> might drop the last reference to the drm_gem_object and 
>>>>>>>>>>>>>>> hence we'd
>>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>>> free the dma-resv lock while holding it, at least if 
>>>>>>>>>>>>>>> it's an external
>>>>>>>>>>>>>>> object.
>>>>>>>>>>>>>> Easiest way in this scheme is to think of the lists as 
>>>>>>>>>>>>>> being protected
>>>>>>>>>>>>>> by the vm's resv lock. That means anybody calling 
>>>>>>>>>>>>>> unlink() must also
>>>>>>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point 
>>>>>>>>>>>>>> of view, but
>>>>>>>>>>>>>> perhaps not from a locking inversion POW from an async 
>>>>>>>>>>>>>> list update).
>>>>>>>>>>>>> This would mean that on unlink() we'd need to hold the 
>>>>>>>>>>>>> VM's resv lock and the
>>>>>>>>>>>>> corresponding GEM's resv lock (in case they're not the 
>>>>>>>>>>>>> same anyways) because the
>>>>>>>>>>>>> VM's resv lock would protect the external / evicted object 
>>>>>>>>>>>>> lists and the GEM
>>>>>>>>>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos 
>>>>>>>>>>>>> and the
>>>>>>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For extobjs an outer lock would be enough in case of 
>>>>>>>>>>>>>>>>> Xe, but I
>>>>>>>>>>>>>>>>> really would not
>>>>>>>>>>>>>>>>> like to add even more complexity just to get the 
>>>>>>>>>>>>>>>>> spinlock out of
>>>>>>>>>>>>>>>>> the way in case
>>>>>>>>>>>>>>>>> the driver already has an outer lock protecting this 
>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>> I must disagree here. These spinlocks and atomic 
>>>>>>>>>>>>>>>> operations are
>>>>>>>>>>>>>>>> pretty
>>>>>>>>>>>>>>>> costly and as discussed earlier this type of locking 
>>>>>>>>>>>>>>>> was the reason
>>>>>>>>>>>>>>>> (at
>>>>>>>>>>>>>>>> least according to the commit message) that made 
>>>>>>>>>>>>>>>> Christian drop the
>>>>>>>>>>>>>>>> XArray
>>>>>>>>>>>>>>>> use in drm_exec for the same set of objects: "The 
>>>>>>>>>>>>>>>> locking overhead
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the added
>>>>>>>>>>>>>>>> complexity and a
>>>>>>>>>>>>>>>> single wide lock following the drm locking guidelines 
>>>>>>>>>>>>>>>> set out by
>>>>>>>>>>>>>>>> Daniel and
>>>>>>>>>>>>>>>> David should really be the default choice with an 
>>>>>>>>>>>>>>>> opt-in for a
>>>>>>>>>>>>>>>> spinlock if
>>>>>>>>>>>>>>>> needed for async and pushing out to a wq is not an option.
>>>>>>>>>>>>>>> For the external object list an outer lock would work as 
>>>>>>>>>>>>>>> long as it's
>>>>>>>>>>>>>>> not the
>>>>>>>>>>>>>>> dma-resv lock of the corresponding GEM object, since 
>>>>>>>>>>>>>>> here we actually
>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>>>>>>> It's just a bit weird design wise that drivers would 
>>>>>>>>>>>>>>> need to take
>>>>>>>>>>>>>>> this outer
>>>>>>>>>>>>>>> lock on:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Given that it seems reasonable to do all the required 
>>>>>>>>>>>>>>> locking
>>>>>>>>>>>>>>> internally.
>>>>>>>>>>>>>>  From a design POW, there has been a clear direction in 
>>>>>>>>>>>>>> XE to make
>>>>>>>>>>>>>> things similar to mmap() / munmap(), so this outer lock, 
>>>>>>>>>>>>>> which in Xe is
>>>>>>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's 
>>>>>>>>>>>>>> protecting
>>>>>>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>>>>>>> structures and
>>>>>>>>>>>>>> the extobj list. Basically it's taken early in the exec 
>>>>>>>>>>>>>> IOCTL, the
>>>>>>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the 
>>>>>>>>>>>>>> pagefault handler, so
>>>>>>>>>>>>>> all of the above are just asserting that it is taken in 
>>>>>>>>>>>>>> the correct
>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But strictly with this scheme one could also use the vm's 
>>>>>>>>>>>>>> dma_resv for
>>>>>>>>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>>>>>>>>> traversing the
>>>>>>>>>>>>>> list.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The whole point of this scheme is to rely on locks that 
>>>>>>>>>>>>>> you already are
>>>>>>>>>>>>>> supposed to be holding for various reasons and is simple 
>>>>>>>>>>>>>> to comprehend.
>>>>>>>>>>>>> I don't agree that we're supposed to hold the VM's resv 
>>>>>>>>>>>>> lock anyways for
>>>>>>>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), 
>>>>>>>>>>>>> but I'm fine using it
>>>>>>>>>>>>> for that purpose nevertheless.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order to at least place lockdep checks, the driver 
>>>>>>>>>>>>>>> would need to
>>>>>>>>>>>>>>> supply the
>>>>>>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM 
>>>>>>>>>>>>>>> otherwise doesn't
>>>>>>>>>>>>>>> know about
>>>>>>>>>>>>>>> the lock.
>>>>>>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>>>>>>> I'd really like to avoid that, especially now that 
>>>>>>>>>>>>> everything got simpler. We
>>>>>>>>>>>>> should define the actual locks to take instead.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() 
>>>>>>>>>>>>>>> that doesn't
>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>> spin?
>>>>>>>>>>>>>> I guess it's hard to tell exactly, but it is much lower 
>>>>>>>>>>>>>> on modern x86
>>>>>>>>>>>>>> than what it used to be. Not sure about ARM, which is the 
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>> architecture important to us. I figure if there is little 
>>>>>>>>>>>>>> cache-line
>>>>>>>>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm 
>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>> spinlock_t
>>>>>>>>>>>>>>>> *lock)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For such drivers, that would require anybody calling 
>>>>>>>>>>>>>>>>>> unlink to
>>>>>>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock for 
>>>>>>>>>>>>>>>>> the GEMs
>>>>>>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>>>>>>> VM_BO list to be more precise). We can't just use the 
>>>>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>>>>> lock for that
>>>>>>>>>>>>>>>>> with VM_BO abstractions, because on destruction of a 
>>>>>>>>>>>>>>>>> VM_BO we
>>>>>>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's 
>>>>>>>>>>>>>>>>> the fix I
>>>>>>>>>>>>>>>>> was referring to
>>>>>>>>>>>>>>>>> earlier.
>>>>>>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the 
>>>>>>>>>>>>>>>> GEM's gpuva
>>>>>>>>>>>>>>>> list, but
>>>>>>>>>>>>>>>> holding the vm's dma-resv lock across the unlink 
>>>>>>>>>>>>>>>> shouldn't be a
>>>>>>>>>>>>>>>> problem. We
>>>>>>>>>>>>>>>> may free the object and a pointer to the vm's resv 
>>>>>>>>>>>>>>>> during unlink
>>>>>>>>>>>>>>>> but we
>>>>>>>>>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring 
>>>>>>>>>>>>>>>> that any
>>>>>>>>>>>>>>>> calls to
>>>>>>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>>>>>>> Drivers calling unlink() from the fence signaling path 
>>>>>>>>>>>>>>> can't use the
>>>>>>>>>>>>>>> VM's
>>>>>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>>>>> Yes, that made me a bit curious because in the current 
>>>>>>>>>>>>>> version the code
>>>>>>>>>>>>>> required the object's dma_resv for unlink() which can't 
>>>>>>>>>>>>>> be grabbed
>>>>>>>>>>>>>> either from the fence signaling path. So are there any 
>>>>>>>>>>>>>> drivers actually
>>>>>>>>>>>>>> wanting to do that? If so, they will either need to 
>>>>>>>>>>>>>> resort to the
>>>>>>>>>>>>>> current spinlock solution or they will need to call 
>>>>>>>>>>>>>> unlink from a
>>>>>>>>>>>>>> workqueue item.
>>>>>>>>>>>>> As Boris already mentioned we have the dma-resv lock by 
>>>>>>>>>>>>> default or a driver
>>>>>>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of 
>>>>>>>>>>>>> the latter.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, what if the object is an external object? We can't 
>>>>>>>>>>>>>>> use the VM's
>>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>>> lock here.
>>>>>>>>>>>>>> Why? Typically (sync) unlink is only ever called from an 
>>>>>>>>>>>>>> unbind-like
>>>>>>>>>>>>>> operation where it should be trivial to grab the vm's 
>>>>>>>>>>>>>> resv. Or, for
>>>>>>>>>>>>>> that matter any outer lock protecting the extobj list. 
>>>>>>>>>>>>>> Rule would be
>>>>>>>>>>>>>> the drm_gpuvm_bo::entry::extobj  and 
>>>>>>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>>>>>>> be protected by either the vm's dma_resv (or possibly an 
>>>>>>>>>>>>>> outer lock in
>>>>>>>>>>>>>> the case of the extobj list).
>>>>>>>>>>>>> Outer lock wouldn't have been working for updates in the 
>>>>>>>>>>>>> async path, but
>>>>>>>>>>>>> shouldn't be relevant anymore. We could use the VM's resv 
>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held when 
>>>>>>>>>>>>>>> calling
>>>>>>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which 
>>>>>>>>>>>>>>> if the
>>>>>>>>>>>>>>> refcount drops
>>>>>>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>>>>>>> drop the
>>>>>>>>>>>>>>> last reference of the GEM object.
>>>>>>>>>>>>>> Yes, but this is a different problem as to what exactly 
>>>>>>>>>>>>>> protects
>>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an 
>>>>>>>>>>>>>> internal per bo list
>>>>>>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need to 
>>>>>>>>>>>>>> ensure that
>>>>>>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually 
>>>>>>>>>>>>>> refcounts its obj
>>>>>>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>>>>>>> refcount (I know
>>>>>>>>>>>>>> Boris didn't like that, but requiring an explicit 
>>>>>>>>>>>>>> refcount for a
>>>>>>>>>>>>>> pointer you dereference unless you're under a lock that 
>>>>>>>>>>>>>> ensures keeping
>>>>>>>>>>>>>> the object alive is pretty much required?) But anyway for 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or 
>>>>>>>>>>>>>> internal spinlock)
>>>>>>>>>>>>>> I don't have a strong preference.
>>>>>>>>>>>>> We can keep the GEM objects dma-resv lock, however as 
>>>>>>>>>>>>> mentioned above
>>>>>>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires 
>>>>>>>>>>>>> both the VM's resv lock
>>>>>>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>>>>>>> I don't think these are real problems.
>>>>>>>>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>>>>>>>>> currently have
>>>>>>>>>>>>>> slightly different approach to collect external bos 
>>>>>>>>>>>>>> needing rebinding,
>>>>>>>>>>>>>> we have this working fine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> TBH I think pretty much the only situation where the 
>>>>>>>>>>>>>> spinlock is needed
>>>>>>>>>>>>>> is for async updates of these lists, unless a wq item can 
>>>>>>>>>>>>>> be used for
>>>>>>>>>>>>>> that, but it doesn't really seem like the current code 
>>>>>>>>>>>>>> allows for such
>>>>>>>>>>>>>> updates anyway? It complicates the code a lot, adds 
>>>>>>>>>>>>>> overhead and also
>>>>>>>>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It seems that with that also the refcount could be 
>>>>>>>>>>>>>>>>>> make non-
>>>>>>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> All in the spirit of the drm locking guidelines "use 
>>>>>>>>>>>>>>>>>> big locks
>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>> possible".
>>>>>>>>>>>>>>>>>> Lower level locks only when necessary for performance or
>>>>>>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Elements popped from the original list are kept 
>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \
>>>>>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>>>>>>> {                                                        \ 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list 
>>>>>>>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., 
>>>>>>>>>>>>>>>>>>> vm_bo);
>>>>>>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, 
>>>>>>>>>>>>>>>>>>> <list_name>,
>>>>>>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Only used for internal list iterations, not 
>>>>>>>>>>>>>>>>>>> meant to be
>>>>>>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>>>>>>> +       for (__vm_bo = 
>>>>>>>>>>>>>>>>>>> get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>>>>>>> +            __vm_bo = 
>>>>>>>>>>>>>>>>>>> get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back 
>>>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used 
>>>>>>>>>>>>>>>>>>> to store
>>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we 
>>>>>>>>>>>>>>>>>>> should call
>>>>>>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>>>>>>> + * to restore the original state and let new 
>>>>>>>>>>>>>>>>>>> iterations take
>>>>>>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>>>>>>> +               /* Merge back the two lists, moving 
>>>>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>>>>>>> +                * head to preserve previous 
>>>>>>>>>>>>>>>>>>> ordering, in
>>>>>>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into 
>>>>>>>>>>>>>>>>>>> the given
>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list 
>>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>>>>>>> __list_name)      ��                     \
>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from 
>>>>>>>>>>>>>>>>>>> the given
>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list 
>>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>     #define 
>>>>>>>>>>>>>>>>>>> to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, 
>>>>>>>>>>>>>>>>>>> range);
>>>>>>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct 
>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>>>>>>>>               "GPUVA tree is not empty, potentially 
>>>>>>>>>>>>>>>>>>> leaking
>>>>>>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), 
>>>>>>>>>>>>>>>>>>> "Extobj list
>>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), "Evict 
>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all 
>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Note: This function is safe against concurrent 
>>>>>>>>>>>>>>>>>>> insertion
>>>>>>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Drivers need to make sure to protect this case with
>>>>>>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before this 
>>>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the 
>>>>>>>>>>>>>>>>>>> GPUVM's
>>>>>>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>> + unsigned int num_fences)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, &extobjs,
>>>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>> vm_bo->obj,
>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the 
>>>>>>>>>>>>>>>>>>> loop. */
>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>>> mapped within
>>>>>>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>>>>>>> +                       u64 addr, u64 range, 
>>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, addr, 
>>>>>>>>>>>>>>>>>>> end) {
>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object *obj = 
>>>>>>>>>>>>>>>>>>> va->gem.obj;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>>>>>>> + * being set the driver receives the given @fn 
>>>>>>>>>>>>>>>>>>> callback to
>>>>>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, exec,
>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +               ret = 
>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>>>>>>> +                       ret = 
>>>>>>>>>>>>>>>>>>> vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec); 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, 
>>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       return 
>>>>>>>>>>>>>>>>>>> drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv 
>>>>>>>>>>>>>>>>>>> of all
>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>>>>>>> + * @num_objs: the number of additional 
>>>>>>>>>>>>>>>>>>> &drm_gem_objects to
>>>>>>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>>>>>>> + * contains mappings of, plus the ones given 
>>>>>>>>>>>>>>>>>>> through @objs.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, num_fences,
>>>>>>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>>> mapped
>>>>>>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>> +                         u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>> +                         bool interruptible)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_range(gpuvm, 
>>>>>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked 
>>>>>>>>>>>>>>>>>>> as evicted
>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback 
>>>>>>>>>>>>>>>>>>> for all
>>>>>>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, &evict, 
>>>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv);
>>>>>>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the 
>>>>>>>>>>>>>>>>>>> loop. */
>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private 
>>>>>>>>>>>>>>>>>>> and all
>>>>>>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>> +                        enum dma_resv_usage 
>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>> +                        enum dma_resv_usage 
>>>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, index, 
>>>>>>>>>>>>>>>>>>> obj) {
>>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new instance 
>>>>>>>>>>>>>>>>>>> of struct
>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct 
>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the 
>>>>>>>>>>>>>>>>>>> reference of
>>>>>>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * If the reference count drops to zero, the 
>>>>>>>>>>>>>>>>>>> &gpuvm_bo is
>>>>>>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. 
>>>>>>>>>>>>>>>>>>> Hence, if
>>>>>>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>>>>>>> + * function can potentially let the reference count 
>>>>>>>>>>>>>>>>>>> to zero
>>>>>>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva 
>>>>>>>>>>>>>>>>>>> lock.
>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct 
>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>>     __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ 
>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the 
>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bo to its
>>>>>>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its 
>>>>>>>>>>>>>>>>>>> &drm_gpuvm's the
>>>>>>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's extobj 
>>>>>>>>>>>>>>>>>>> list if
>>>>>>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>>>>>>> + * already and if the corresponding &drm_gem_object 
>>>>>>>>>>>>>>>>>>> is an
>>>>>>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a 
>>>>>>>>>>>>>>>>>>> &drm_gem_object to
>>>>>>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool 
>>>>>>>>>>>>>>>>>>> evict)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, 
>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>>>>>>     __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>>> extobj list
>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>> +        * @evict: structure holding the evict list 
>>>>>>>>>>>>>>>>>>> and evict
>>>>>>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct
>>>>>>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct 
>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the given
>>>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv 
>>>>>>>>>>>>>>>>>>> differs
>>>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct 
>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct 
>>>>>>>>>>>>>>>>>>> drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, next__, 
>>>>>>>>>>>>>>>>>>> gpuvm__)
>>>>>>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, 
>>>>>>>>>>>>>>>>>>> &(gpuvm__)-
>>>>>>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock 
>>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA 
>>>>>>>>>>>>>>>>>>> reservations
>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>> +        * @extra: Callback and corresponding 
>>>>>>>>>>>>>>>>>>> private data
>>>>>>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>> +                * @priv: driver private data for 
>>>>>>>>>>>>>>>>>>> the @fn
>>>>>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs 
>>>>>>>>>>>>>>>>>>> common dma-
>>>>>>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>> +                    unsigned int num_fences)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>> &gpuvm->d_obj,
>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all 
>>>>>>>>>>>>>>>>>>> assoiciated
>>>>>>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Releases all dma-resv locks of all &drm_gem_objects
>>>>>>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object *obj, 
>>>>>>>>>>>>>>>>>>> bool
>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to walk 
>>>>>>>>>>>>>>>>>>> over a
>>>>>>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in each
>>>>>>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op 
>>>>>>>>>>>>>>>>>>> *op, void
>>>>>>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>> +        * @bo_validate: called from 
>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate()
>>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>>> +        * Drivers receive this callback for every 
>>>>>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this callback.
>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>> void *priv,
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
Thomas Hellstrom Sept. 20, 2023, 1:38 p.m. UTC | #55
On 9/20/23 15:06, Christian König wrote:
>
>
> Am 20.09.23 um 14:06 schrieb Thomas Hellström:
>>
>> On 9/20/23 12:51, Christian König wrote:
>>> Am 20.09.23 um 09:44 schrieb Thomas Hellström:
>>>> Hi,
>>>>
>>>> On 9/20/23 07:37, Christian König wrote:
>>>>> Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>>>>>>
>>>>>> On 9/19/23 17:16, Danilo Krummrich wrote:
>>>>>>> On 9/19/23 14:21, Thomas Hellström wrote:
>>>>>>>> Hi Christian
>>>>>>>>
>>>>>>>> On 9/19/23 14:07, Christian König wrote:
>>>>>>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>>>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>>>>>>> As mentioned in a different mail thread, the reply is 
>>>>>>>>>>>>>> based on the assumption
>>>>>>>>>>>>>> that we don't support anything else than GPUVM updates 
>>>>>>>>>>>>>> from the IOCTL.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that this assumption is incorrect.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, more precisely I should have said "don't support 
>>>>>>>>>>>> GPUVM updated from within
>>>>>>>>>>>> fence signaling critical sections". And looking at the 
>>>>>>>>>>>> code, that doesn't seem what
>>>>>>>>>>>> you're doing there.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Vulkan is just once specific use case, but this here 
>>>>>>>>>>>>> should probably be able to handle other use cases as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Especially with HMM you get the requirement that you need 
>>>>>>>>>>>>> to be able to invalidate GPUVM mappings without grabbing a 
>>>>>>>>>>>>> reservation lock.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>>>>>>> should only be called from a ttm_device_funcs::move 
>>>>>>>>>>>> callback, we should hold the dma-resv
>>>>>>>>>>>> lock there.
>>>>>>>>>>>
>>>>>>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>>>>>>
>>>>>>>>>>> In the move callback we only hold the dma-resv lock of the 
>>>>>>>>>>> BO which is moved, but when that is a shared BO then that's 
>>>>>>>>>>> not the same as the one for the VM.
>>>>>>>>>>
>>>>>>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>>>>>>> protect drm_gpuvm_bo::evicted
>>>>>>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted 
>>>>>>>>>> list once we grabbed all
>>>>>>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We 
>>>>>>>>>> can remove them from the evicted
>>>>>>>>>> list on validate(). This way we never touch the evicted list 
>>>>>>>>>> without holding at least the VM's
>>>>>>>>>> dma-resv lock.
>>>>>>>>>>
>>>>>>>>>> Do you have any concerns about that?
>>>>>>>>>
>>>>>>>>> Scratching my head a bit how that is supposed to work.
>>>>>>>>>
>>>>>>>>> This implies that you go over all the evicted BOs during 
>>>>>>>>> validation and not just the one mentioned in the CS.
>>>>>>>>>
>>>>>>>>> That might work for Vulkan, but is pretty much a no-go for 
>>>>>>>>> OpenGL.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>>>>>>
>>>>>>>>>>>> The eviction_lock seems to protect a VM state "evicting" of 
>>>>>>>>>>>> whether any BO that
>>>>>>>>>>>> is associated with the VM is currently evicting. At the 
>>>>>>>>>>>> same time amdgpu protects
>>>>>>>>>>>> the eviceted list of the VM with a different lock. So this 
>>>>>>>>>>>> seems to be entirely
>>>>>>>>>>>> unrelated. Tracking a "currently evicting" state is not 
>>>>>>>>>>>> part of the GPUVM
>>>>>>>>>>>> implementation currently and hence nothing would change for 
>>>>>>>>>>>> amdgpu there.
>>>>>>>>>>>
>>>>>>>>>>> Sorry for the confusion we use different terminology in amdgpu.
>>>>>>>>>>>
>>>>>>>>>>> The eviction lock and evicted state is for the VM page 
>>>>>>>>>>> tables, e.g. if the whole VM is currently not used and 
>>>>>>>>>>> swapped out or even de-allocated.
>>>>>>>>>>>
>>>>>>>>>>> This is necessary because we have cases where we need to 
>>>>>>>>>>> access the VM data without holding the dma-resv lock of this 
>>>>>>>>>>> VM. Especially figuring out which parts of an address space 
>>>>>>>>>>> contain mappings and which doesn't.
>>>>>>>>>>
>>>>>>>>>> I think this is fine, this has nothing to do with lists of 
>>>>>>>>>> evicted GEM objects or external GEM
>>>>>>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>>>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>>>>>>> the VA space does not require any dma-resv locks.
>>>>>>>>>
>>>>>>>>> I hope so, but I'm not 100% sure.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is a requirement which comes with HMM handling, you 
>>>>>>>>>>> won't see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The invalidation lock on the other hand is what in this 
>>>>>>>>>>> discussion is called eviction lock. This one is needed 
>>>>>>>>>>> because what I wrote above, during the move callback only 
>>>>>>>>>>> the dma-resv of the BO which is moved is locked, but not 
>>>>>>>>>>> necessarily the dma-resv of the VM.
>>>>>>>>>>
>>>>>>>>>> That's yet another thing, right? This is used to track 
>>>>>>>>>> whether *any* BO that belongs to the VM is
>>>>>>>>>> currently being evicted, correct? As mentioned, as by now 
>>>>>>>>>> this is not supported in GPUVM and hence
>>>>>>>>>> would be the same driver specific code with the same driver 
>>>>>>>>>> specifc lock.
>>>>>>>>>
>>>>>>>>> That is most likely a show stopper using this for OpenGL based 
>>>>>>>>> workloads as far as I can see. For those you need to able to 
>>>>>>>>> figure out which non-VM BOs have been evicted and which parts 
>>>>>>>>> of the VM needs updates.
>>>>>>>>
>>>>>>>> We identify those with a bool in the gpuvm_bo, and that bool is 
>>>>>>>> protected by the bo_resv. In essence, the "evicted" list must 
>>>>>>>> be made up-to-date with all relevant locks held before 
>>>>>>>> traversing in the next exec.
>>>>>>>
>>>>>>> What I still miss with this idea is how do we find all the 
>>>>>>> drm_gpuvm_bo structures with the evicted bool set to true? When 
>>>>>>> doing the drm_exec dance we come across all external ones and 
>>>>>>> can add them to the list if needed, but what about the BOs 
>>>>>>> having the VM's dma-resv?
>>>>>>
>>>>>> Oh, they can be added to the evict list directly (no bool needed) 
>>>>>> in the eviction code, like in v3. Since for those we indeed hold 
>>>>>> the VM's dma_resv since it's aliased with the object's dma-resv.
>>>>>
>>>>> Yeah, I wanted to note what Danilo seems to think about as well. 
>>>>> How do we figure out the non-VM BOs evicted?
>>>>>
>>>>> We can't walk over the list of all non-VM BOs on every submission, 
>>>>> that's to much overhead for cases with lots of non-VM BOs.
>>>>>
>>>>> And we can't rely on userspace sending all non-VM BOs as used list 
>>>>> down to the kernel with each submission.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>
>>>> No, that's not needed: Mechanism below.
>>>>
>>>> 1) We maintain an evicted list. Typically protected by the vm resv.
>>>> 2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.
>>>>
>>>> a) Evicting a vm bo: The vm resv is held by the eviction code. Just 
>>>> put it on the evicted list.
>>>> b) Evicting a shared/external bo: The bo resv is held by the 
>>>> eviction code. Set the "evicted" bool
>>>> c) Validating the evicted list on exec:
>>>
>>>
>>>> Loop through all *external/shared* bos.
>>>
>>> And this is what you can't do. For Vulkan it probably doesn't 
>>> matter, but for OpenGL and especially multimedia we have much more 
>>> BOs on the shared list than what's allocated for the VM.
>>
>> But you need to lock- and fence all those so you need to loop through 
>> them anyway, so we're still O(n_shared)? Or is there some clever 
>> optimization in amdgpu?
>
> Why should I lock and fence them? Only the BOs in the relocation list 
> are locked and fenced.

Do you by "relocation" list refer to what gpuvm calls "evict" list or 
something else? Like the relocaton/validation list that used to be sent 
from user-space for non-VM_BIND vms?

The vm bos plus the external/shared bos bound to the VM (the external 
list) are the bos being referenced by the current batch. So the bos on 
the VM's external list are the ones being locked and fenced and checked 
for eviction. If they weren't they could be evicted before the current 
batch completes?

Thanks,

Thomas


>
> Regards,
> Christian.
>
>>
>> I think with some UMDs, xe might end up with similar large lists...
>>
>> /Thomas
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Lock them. After locking, check the "evicted" bool, if it's true. 
>>>> put the bo on the evicted list (we hold the VM resv at this point) 
>>>> and clear the "evicted" bool. Note that other vms will have their 
>>>> own gpuvm_bo which is marked evicted.
>>>>
>>>> I have this coded up in a patch for Xe and it seems to be working 
>>>> properly.
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> If you mean that we need to unbind all vmas of all vms of 
>>>>>>>> evicted bos before evicting, We don't do that, at least not in 
>>>>>>>> Xe, since evicting we wait for VM idle, and it cant access 
>>>>>>>> anything through the stale vmas until they have been 
>>>>>>>> revalidated and rebound.
>>>>>>>>
>>>>>>>> /Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas 
>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas 
>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas 
>>>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common 
>>>>>>>>>>>>>>>>>>>> infrastructure to
>>>>>>>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>>>>>>>> allocations and mappings, generically connect GPU 
>>>>>>>>>>>>>>>>>>>> VA mappings
>>>>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>>>>> backing buffers and perform more complex mapping 
>>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> However, there are more design patterns commonly 
>>>>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>>>>>>>> can potentially be generalized in order to make the 
>>>>>>>>>>>>>>>>>>>> DRM GPUVA
>>>>>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this 
>>>>>>>>>>>>>>>>>>>> context,
>>>>>>>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not 
>>>>>>>>>>>>>>>>>>>> being used
>>>>>>>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM 
>>>>>>>>>>>>>>>>>>>> objects
>>>>>>>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM 
>>>>>>>>>>>>>>>>>>>> objects dma-
>>>>>>>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>>>>>>>       of, such that validation of evicted GEM 
>>>>>>>>>>>>>>>>>>>> objects is
>>>>>>>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 5) Provide some convinience functions for common 
>>>>>>>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Rather than being designed as a "framework", the 
>>>>>>>>>>>>>>>>>>>> target is to
>>>>>>>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>>>>>>>> features appear as a collection of optional helper 
>>>>>>>>>>>>>>>>>>>> functions,
>>>>>>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA 
>>>>>>>>>>>>>>>>>>>> managers basic
>>>>>>>>>>>>>>>>>>>> functionality and opt-in for other features without 
>>>>>>>>>>>>>>>>>>>> setting
>>>>>>>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>>>>>>>> flags, just by making use of the corresponding 
>>>>>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to figure 
>>>>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>>>>>>>> updating the GPU VA space within the fence 
>>>>>>>>>>>>>>>>>>>> signalling path.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>>>>>>     include/drm/drm_gpuvm.h | 197 ++++++++++++++
>>>>>>>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for 
>>>>>>>>>>>>>>>>>>>> an existing
>>>>>>>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>>>>>>>      * particular combination. If not existent a 
>>>>>>>>>>>>>>>>>>>> new instance
>>>>>>>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a given
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of external 
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>>>>>>>> + * list are maintained in order to accelerate 
>>>>>>>>>>>>>>>>>>>> locking of
>>>>>>>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>>>>>>>> + * validation of evicted objects bound in a 
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm. For
>>>>>>>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given 
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm can be
>>>>>>>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can 
>>>>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>>>>>>>> + * order to validate all evicted &drm_gem_objects. 
>>>>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the 
>>>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as 
>>>>>>>>>>>>>>>>>>>> external object
>>>>>>>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's 
>>>>>>>>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() 
>>>>>>>>>>>>>>>>>>>> for the same
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe 
>>>>>>>>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep instances 
>>>>>>>>>>>>>>>>>>>> unique.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of 
>>>>>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>>>>>>>> + * protected against concurrent insertion / 
>>>>>>>>>>>>>>>>>>>> removal and
>>>>>>>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * However, drivers still need ensure to protect 
>>>>>>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>>>>>>>> + * iterating those lists, such as 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() and
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such 
>>>>>>>>>>>>>>>>>>>> function contains
>>>>>>>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Functions adding or removing entries from those 
>>>>>>>>>>>>>>>>>>>> lists,
>>>>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>>>>>>>> + * (safely) modified while potentially being 
>>>>>>>>>>>>>>>>>>>> iternated by
>>>>>>>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next vm_bo 
>>>>>>>>>>>>>>>>>>>> element
>>>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used 
>>>>>>>>>>>>>>>>>>>> to store
>>>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>> Are the list spinlocks needed for that async state 
>>>>>>>>>>>>>>>>>>> update from
>>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Otherwise it should be sufficient to protect the 
>>>>>>>>>>>>>>>>>>> lists with the
>>>>>>>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If those spinlocks are still needed in some 
>>>>>>>>>>>>>>>>>>> situations, perhaps
>>>>>>>>>>>>>>>>>>> could we
>>>>>>>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the 
>>>>>>>>>>>>>>>>>>> maple tree
>>>>>>>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>>>>>>>> holding only the dma-resv lock from the BO this 
>>>>>>>>>>>>>>>>>> function gets
>>>>>>>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>>>>>>>> the spinlock protects concurrent drm_gpuvm_bo_evict() 
>>>>>>>>>>>>>>>>>> calls with
>>>>>>>>>>>>>>>>>> different BOs.
>>>>>>>>>>>>>>>>> No. Only if you try to add external objects to the 
>>>>>>>>>>>>>>>>> vm's evict list
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> within the evict code. That's not necessary since you 
>>>>>>>>>>>>>>>>> loop through
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>> external objects anyway when locking them so an 
>>>>>>>>>>>>>>>>> "evicted" bool in
>>>>>>>>>>>>>>>>> the vm_bo,
>>>>>>>>>>>>>>>>> protected by the bo resv would be sufficient. The 
>>>>>>>>>>>>>>>>> extobj locking
>>>>>>>>>>>>>>>>> loop can
>>>>>>>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>>>>>>>> And validate() can remove it while still holding all 
>>>>>>>>>>>>>>>> dma-resv locks,
>>>>>>>>>>>>>>>> neat!
>>>>>>>>>>>>>>>> However, what if two tasks are trying to lock the VA space
>>>>>>>>>>>>>>>> concurrently? What
>>>>>>>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in
>>>>>>>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>>>>>>>> Are we guaranteed that at this point of time the 
>>>>>>>>>>>>>>>> drm_gpuvm_bo is not
>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>>>> might drop the last reference to the drm_gem_object and 
>>>>>>>>>>>>>>>> hence we'd
>>>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>>>> free the dma-resv lock while holding it, at least if 
>>>>>>>>>>>>>>>> it's an external
>>>>>>>>>>>>>>>> object.
>>>>>>>>>>>>>>> Easiest way in this scheme is to think of the lists as 
>>>>>>>>>>>>>>> being protected
>>>>>>>>>>>>>>> by the vm's resv lock. That means anybody calling 
>>>>>>>>>>>>>>> unlink() must also
>>>>>>>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point 
>>>>>>>>>>>>>>> of view, but
>>>>>>>>>>>>>>> perhaps not from a locking inversion POW from an async 
>>>>>>>>>>>>>>> list update).
>>>>>>>>>>>>>> This would mean that on unlink() we'd need to hold the 
>>>>>>>>>>>>>> VM's resv lock and the
>>>>>>>>>>>>>> corresponding GEM's resv lock (in case they're not the 
>>>>>>>>>>>>>> same anyways) because the
>>>>>>>>>>>>>> VM's resv lock would protect the external / evicted 
>>>>>>>>>>>>>> object lists and the GEM
>>>>>>>>>>>>>> objects resv lock protects the GEM's list of 
>>>>>>>>>>>>>> drm_gpuvm_bos and the
>>>>>>>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For extobjs an outer lock would be enough in case of 
>>>>>>>>>>>>>>>>>> Xe, but I
>>>>>>>>>>>>>>>>>> really would not
>>>>>>>>>>>>>>>>>> like to add even more complexity just to get the 
>>>>>>>>>>>>>>>>>> spinlock out of
>>>>>>>>>>>>>>>>>> the way in case
>>>>>>>>>>>>>>>>>> the driver already has an outer lock protecting this 
>>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>> I must disagree here. These spinlocks and atomic 
>>>>>>>>>>>>>>>>> operations are
>>>>>>>>>>>>>>>>> pretty
>>>>>>>>>>>>>>>>> costly and as discussed earlier this type of locking 
>>>>>>>>>>>>>>>>> was the reason
>>>>>>>>>>>>>>>>> (at
>>>>>>>>>>>>>>>>> least according to the commit message) that made 
>>>>>>>>>>>>>>>>> Christian drop the
>>>>>>>>>>>>>>>>> XArray
>>>>>>>>>>>>>>>>> use in drm_exec for the same set of objects: "The 
>>>>>>>>>>>>>>>>> locking overhead
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the 
>>>>>>>>>>>>>>>>> added
>>>>>>>>>>>>>>>>> complexity and a
>>>>>>>>>>>>>>>>> single wide lock following the drm locking guidelines 
>>>>>>>>>>>>>>>>> set out by
>>>>>>>>>>>>>>>>> Daniel and
>>>>>>>>>>>>>>>>> David should really be the default choice with an 
>>>>>>>>>>>>>>>>> opt-in for a
>>>>>>>>>>>>>>>>> spinlock if
>>>>>>>>>>>>>>>>> needed for async and pushing out to a wq is not an 
>>>>>>>>>>>>>>>>> option.
>>>>>>>>>>>>>>>> For the external object list an outer lock would work 
>>>>>>>>>>>>>>>> as long as it's
>>>>>>>>>>>>>>>> not the
>>>>>>>>>>>>>>>> dma-resv lock of the corresponding GEM object, since 
>>>>>>>>>>>>>>>> here we actually
>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>>>>>>>> It's just a bit weird design wise that drivers would 
>>>>>>>>>>>>>>>> need to take
>>>>>>>>>>>>>>>> this outer
>>>>>>>>>>>>>>>> lock on:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Given that it seems reasonable to do all the required 
>>>>>>>>>>>>>>>> locking
>>>>>>>>>>>>>>>> internally.
>>>>>>>>>>>>>>>  From a design POW, there has been a clear direction in 
>>>>>>>>>>>>>>> XE to make
>>>>>>>>>>>>>>> things similar to mmap() / munmap(), so this outer lock, 
>>>>>>>>>>>>>>> which in Xe is
>>>>>>>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. 
>>>>>>>>>>>>>>> It's protecting
>>>>>>>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>>>>>>>> structures and
>>>>>>>>>>>>>>> the extobj list. Basically it's taken early in the exec 
>>>>>>>>>>>>>>> IOCTL, the
>>>>>>>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the 
>>>>>>>>>>>>>>> pagefault handler, so
>>>>>>>>>>>>>>> all of the above are just asserting that it is taken in 
>>>>>>>>>>>>>>> the correct
>>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But strictly with this scheme one could also use the 
>>>>>>>>>>>>>>> vm's dma_resv for
>>>>>>>>>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>>>>>>>>>> traversing the
>>>>>>>>>>>>>>> list.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The whole point of this scheme is to rely on locks that 
>>>>>>>>>>>>>>> you already are
>>>>>>>>>>>>>>> supposed to be holding for various reasons and is simple 
>>>>>>>>>>>>>>> to comprehend.
>>>>>>>>>>>>>> I don't agree that we're supposed to hold the VM's resv 
>>>>>>>>>>>>>> lock anyways for
>>>>>>>>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), 
>>>>>>>>>>>>>> but I'm fine using it
>>>>>>>>>>>>>> for that purpose nevertheless.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In order to at least place lockdep checks, the driver 
>>>>>>>>>>>>>>>> would need to
>>>>>>>>>>>>>>>> supply the
>>>>>>>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM 
>>>>>>>>>>>>>>>> otherwise doesn't
>>>>>>>>>>>>>>>> know about
>>>>>>>>>>>>>>>> the lock.
>>>>>>>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>>>>>>>> I'd really like to avoid that, especially now that 
>>>>>>>>>>>>>> everything got simpler. We
>>>>>>>>>>>>>> should define the actual locks to take instead.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Out of curiosity, what is the overhead of a spin_lock() 
>>>>>>>>>>>>>>>> that doesn't
>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>> spin?
>>>>>>>>>>>>>>> I guess it's hard to tell exactly, but it is much lower 
>>>>>>>>>>>>>>> on modern x86
>>>>>>>>>>>>>>> than what it used to be. Not sure about ARM, which is 
>>>>>>>>>>>>>>> the other
>>>>>>>>>>>>>>> architecture important to us. I figure if there is 
>>>>>>>>>>>>>>> little cache-line
>>>>>>>>>>>>>>> bouncing the main overhead comes from the implied barriers.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> A pretty simple way that would not add much code would be
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct 
>>>>>>>>>>>>>>>>> drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>> spinlock_t
>>>>>>>>>>>>>>>>> *lock)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For such drivers, that would require anybody calling 
>>>>>>>>>>>>>>>>>>> unlink to
>>>>>>>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock 
>>>>>>>>>>>>>>>>>> for the GEMs
>>>>>>>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>>>>>>>> VM_BO list to be more precise). We can't just use the 
>>>>>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>>>>>> lock for that
>>>>>>>>>>>>>>>>>> with VM_BO abstractions, because on destruction of a 
>>>>>>>>>>>>>>>>>> VM_BO we
>>>>>>>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's 
>>>>>>>>>>>>>>>>>> the fix I
>>>>>>>>>>>>>>>>>> was referring to
>>>>>>>>>>>>>>>>>> earlier.
>>>>>>>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the 
>>>>>>>>>>>>>>>>> GEM's gpuva
>>>>>>>>>>>>>>>>> list, but
>>>>>>>>>>>>>>>>> holding the vm's dma-resv lock across the unlink 
>>>>>>>>>>>>>>>>> shouldn't be a
>>>>>>>>>>>>>>>>> problem. We
>>>>>>>>>>>>>>>>> may free the object and a pointer to the vm's resv 
>>>>>>>>>>>>>>>>> during unlink
>>>>>>>>>>>>>>>>> but we
>>>>>>>>>>>>>>>>> don't free the vm's resv.  It'd be a matter of 
>>>>>>>>>>>>>>>>> ensuring that any
>>>>>>>>>>>>>>>>> calls to
>>>>>>>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>>>>>>>> Drivers calling unlink() from the fence signaling path 
>>>>>>>>>>>>>>>> can't use the
>>>>>>>>>>>>>>>> VM's
>>>>>>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>>>>>> Yes, that made me a bit curious because in the current 
>>>>>>>>>>>>>>> version the code
>>>>>>>>>>>>>>> required the object's dma_resv for unlink() which can't 
>>>>>>>>>>>>>>> be grabbed
>>>>>>>>>>>>>>> either from the fence signaling path. So are there any 
>>>>>>>>>>>>>>> drivers actually
>>>>>>>>>>>>>>> wanting to do that? If so, they will either need to 
>>>>>>>>>>>>>>> resort to the
>>>>>>>>>>>>>>> current spinlock solution or they will need to call 
>>>>>>>>>>>>>>> unlink from a
>>>>>>>>>>>>>>> workqueue item.
>>>>>>>>>>>>>> As Boris already mentioned we have the dma-resv lock by 
>>>>>>>>>>>>>> default or a driver
>>>>>>>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of 
>>>>>>>>>>>>>> the latter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, what if the object is an external object? We 
>>>>>>>>>>>>>>>> can't use the VM's
>>>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>>>> lock here.
>>>>>>>>>>>>>>> Why? Typically (sync) unlink is only ever called from an 
>>>>>>>>>>>>>>> unbind-like
>>>>>>>>>>>>>>> operation where it should be trivial to grab the vm's 
>>>>>>>>>>>>>>> resv. Or, for
>>>>>>>>>>>>>>> that matter any outer lock protecting the extobj list. 
>>>>>>>>>>>>>>> Rule would be
>>>>>>>>>>>>>>> the drm_gpuvm_bo::entry::extobj  and 
>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>>>>>>>> be protected by either the vm's dma_resv (or possibly an 
>>>>>>>>>>>>>>> outer lock in
>>>>>>>>>>>>>>> the case of the extobj list).
>>>>>>>>>>>>>> Outer lock wouldn't have been working for updates in the 
>>>>>>>>>>>>>> async path, but
>>>>>>>>>>>>>> shouldn't be relevant anymore. We could use the VM's resv 
>>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held 
>>>>>>>>>>>>>>>> when calling
>>>>>>>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), 
>>>>>>>>>>>>>>>> which if the
>>>>>>>>>>>>>>>> refcount drops
>>>>>>>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>>>>>>>> drop the
>>>>>>>>>>>>>>>> last reference of the GEM object.
>>>>>>>>>>>>>>> Yes, but this is a different problem as to what exactly 
>>>>>>>>>>>>>>> protects
>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an 
>>>>>>>>>>>>>>> internal per bo list
>>>>>>>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need to 
>>>>>>>>>>>>>>> ensure that
>>>>>>>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually 
>>>>>>>>>>>>>>> refcounts its obj
>>>>>>>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>>>>>>>> refcount (I know
>>>>>>>>>>>>>>> Boris didn't like that, but requiring an explicit 
>>>>>>>>>>>>>>> refcount for a
>>>>>>>>>>>>>>> pointer you dereference unless you're under a lock that 
>>>>>>>>>>>>>>> ensures keeping
>>>>>>>>>>>>>>> the object alive is pretty much required?) But anyway 
>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or 
>>>>>>>>>>>>>>> internal spinlock)
>>>>>>>>>>>>>>> I don't have a strong preference.
>>>>>>>>>>>>>> We can keep the GEM objects dma-resv lock, however as 
>>>>>>>>>>>>>> mentioned above
>>>>>>>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires 
>>>>>>>>>>>>>> both the VM's resv lock
>>>>>>>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>>>>>>>> I don't think these are real problems.
>>>>>>>>>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>>>>>>>>>> currently have
>>>>>>>>>>>>>>> slightly different approach to collect external bos 
>>>>>>>>>>>>>>> needing rebinding,
>>>>>>>>>>>>>>> we have this working fine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> TBH I think pretty much the only situation where the 
>>>>>>>>>>>>>>> spinlock is needed
>>>>>>>>>>>>>>> is for async updates of these lists, unless a wq item 
>>>>>>>>>>>>>>> can be used for
>>>>>>>>>>>>>>> that, but it doesn't really seem like the current code 
>>>>>>>>>>>>>>> allows for such
>>>>>>>>>>>>>>> updates anyway? It complicates the code a lot, adds 
>>>>>>>>>>>>>>> overhead and also
>>>>>>>>>>>>>>> adds the requirement for refcounting during list traversal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It seems that with that also the refcount could be 
>>>>>>>>>>>>>>>>>>> make non-
>>>>>>>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> All in the spirit of the drm locking guidelines "use 
>>>>>>>>>>>>>>>>>>> big locks
>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>> possible".
>>>>>>>>>>>>>>>>>>> Lower level locks only when necessary for 
>>>>>>>>>>>>>>>>>>> performance or
>>>>>>>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Elements popped from the original list are kept 
>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, 
>>>>>>>>>>>>>>>>>>>> __list_name,
>>>>>>>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \ 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>>>>>>>> {                                                        \ 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list 
>>>>>>>>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., 
>>>>>>>>>>>>>>>>>>>> vm_bo);
>>>>>>>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, 
>>>>>>>>>>>>>>>>>>>> <list_name>,
>>>>>>>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Only used for internal list iterations, not 
>>>>>>>>>>>>>>>>>>>> meant to be
>>>>>>>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>>>>>>>> +       for (__vm_bo = 
>>>>>>>>>>>>>>>>>>>> get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>>>>>>>> +            __vm_bo = 
>>>>>>>>>>>>>>>>>>>> get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements back 
>>>>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list used 
>>>>>>>>>>>>>>>>>>>> to store
>>>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we 
>>>>>>>>>>>>>>>>>>>> should call
>>>>>>>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>>>>>>>> + * to restore the original state and let new 
>>>>>>>>>>>>>>>>>>>> iterations take
>>>>>>>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>>>>>>>> +               /* Merge back the two lists, moving 
>>>>>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>>>>>>>> +                * head to preserve previous 
>>>>>>>>>>>>>>>>>>>> ordering, in
>>>>>>>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into 
>>>>>>>>>>>>>>>>>>>> the given
>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list 
>>>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>>>>>>>> __list_name)      ��                     \
>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from 
>>>>>>>>>>>>>>>>>>>> the given
>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list 
>>>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>     #define 
>>>>>>>>>>>>>>>>>>>> to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, 
>>>>>>>>>>>>>>>>>>>> range);
>>>>>>>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
>>>>>>>>>>>>>>>>>>>>               "GPUVA tree is not empty, potentially 
>>>>>>>>>>>>>>>>>>>> leaking
>>>>>>>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), 
>>>>>>>>>>>>>>>>>>>> "Extobj list
>>>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), 
>>>>>>>>>>>>>>>>>>>> "Evict list
>>>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all 
>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Note: This function is safe against concurrent 
>>>>>>>>>>>>>>>>>>>> insertion
>>>>>>>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Drivers need to make sure to protect this case 
>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before 
>>>>>>>>>>>>>>>>>>>> this function
>>>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that the 
>>>>>>>>>>>>>>>>>>>> GPUVM's
>>>>>>>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, 
>>>>>>>>>>>>>>>>>>>> &extobjs,
>>>>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>>> vm_bo->obj,
>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the 
>>>>>>>>>>>>>>>>>>>> loop. */
>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>>>> mapped within
>>>>>>>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>>>>>>>> +                       u64 addr, u64 range, 
>>>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, 
>>>>>>>>>>>>>>>>>>>> addr, end) {
>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object *obj = 
>>>>>>>>>>>>>>>>>>>> va->gem.obj;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Addionally, when calling this function with struct
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>>>>>>>> + * being set the driver receives the given @fn 
>>>>>>>>>>>>>>>>>>>> callback to
>>>>>>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within this
>>>>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>>>>> +                   unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, 
>>>>>>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +               ret = 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>>>>>>>> +                       ret = 
>>>>>>>>>>>>>>>>>>>> vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec); 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, 
>>>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       return 
>>>>>>>>>>>>>>>>>>>> drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all dma-resv 
>>>>>>>>>>>>>>>>>>>> of all
>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>>>>>>>> + * @num_objs: the number of additional 
>>>>>>>>>>>>>>>>>>>> &drm_gem_objects to
>>>>>>>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>>>>>>>> + * contains mappings of, plus the ones given 
>>>>>>>>>>>>>>>>>>>> through @objs.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>> + bool interruptible)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, 
>>>>>>>>>>>>>>>>>>>> num_fences,
>>>>>>>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>>>> mapped
>>>>>>>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>> + bool interruptible)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>>>> +               ret = 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked 
>>>>>>>>>>>>>>>>>>>> as evicted
>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback 
>>>>>>>>>>>>>>>>>>>> for all
>>>>>>>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, 
>>>>>>>>>>>>>>>>>>>> &evict, vm_bo) {
>>>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv); 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the 
>>>>>>>>>>>>>>>>>>>> loop. */
>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to private 
>>>>>>>>>>>>>>>>>>>> and all
>>>>>>>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage private_usage,
>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage extobj_usage)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, 
>>>>>>>>>>>>>>>>>>>> index, obj) {
>>>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new 
>>>>>>>>>>>>>>>>>>>> instance of struct
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the 
>>>>>>>>>>>>>>>>>>>> reference of
>>>>>>>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * If the reference count drops to zero, the 
>>>>>>>>>>>>>>>>>>>> &gpuvm_bo is
>>>>>>>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. 
>>>>>>>>>>>>>>>>>>>> Hence, if
>>>>>>>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>>>>>>>> + * function can potentially let the reference 
>>>>>>>>>>>>>>>>>>>> count to zero
>>>>>>>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva 
>>>>>>>>>>>>>>>>>>>> lock.
>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>>>>>>>     drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>>> __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>                      struct drm_gem_object *obj)
>>>>>>>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the 
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bo to its
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its 
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm's the
>>>>>>>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's 
>>>>>>>>>>>>>>>>>>>> extobj list if
>>>>>>>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>>>>>>>> + * already and if the corresponding 
>>>>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a 
>>>>>>>>>>>>>>>>>>>> &drm_gem_object to
>>>>>>>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>>>>>>>> + * list containing a mapping of this &drm_gem_object.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, 
>>>>>>>>>>>>>>>>>>>> bool evict)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, 
>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>>>>>>> __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>>>> extobj list
>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>> +        * @evict: structure holding the evict list 
>>>>>>>>>>>>>>>>>>>> and evict
>>>>>>>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>>>>>>>                      const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the 
>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv 
>>>>>>>>>>>>>>>>>>>> differs
>>>>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>>>>>>>     __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct 
>>>>>>>>>>>>>>>>>>>> drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, 
>>>>>>>>>>>>>>>>>>>> next__, gpuvm__)
>>>>>>>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, 
>>>>>>>>>>>>>>>>>>>> &(gpuvm__)-
>>>>>>>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock 
>>>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA 
>>>>>>>>>>>>>>>>>>>> reservations
>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>> +        * @extra: Callback and corresponding 
>>>>>>>>>>>>>>>>>>>> private data
>>>>>>>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>>>>>>>> +        * lock arbitrary additional &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>> +                * @priv: driver private data for 
>>>>>>>>>>>>>>>>>>>> the @fn
>>>>>>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs 
>>>>>>>>>>>>>>>>>>>> common dma-
>>>>>>>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>> +                    struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>>> &gpuvm->d_obj,
>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all 
>>>>>>>>>>>>>>>>>>>> assoiciated
>>>>>>>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Releases all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>>>>>>>                           * gpuva list.
>>>>>>>>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>>>>>>>>> +                        */
>>>>>>>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>>>     drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object 
>>>>>>>>>>>>>>>>>>>> *obj, bool
>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to 
>>>>>>>>>>>>>>>>>>>> walk over a
>>>>>>>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in 
>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op 
>>>>>>>>>>>>>>>>>>>> *op, void
>>>>>>>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>> +        * @bo_validate: called from 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate()
>>>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>>>> +        * Drivers receive this callback for every 
>>>>>>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this 
>>>>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object 
>>>>>>>>>>>>>>>>>>>> *obj);
>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>> void *priv,
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>
Christian König Sept. 20, 2023, 1:48 p.m. UTC | #56
Am 20.09.23 um 15:38 schrieb Thomas Hellström:
>
> On 9/20/23 15:06, Christian König wrote:
>>
>>
>> Am 20.09.23 um 14:06 schrieb Thomas Hellström:
>>>
>>> On 9/20/23 12:51, Christian König wrote:
>>>> Am 20.09.23 um 09:44 schrieb Thomas Hellström:
>>>>> Hi,
>>>>>
>>>>> On 9/20/23 07:37, Christian König wrote:
>>>>>> Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>>>>>>>
>>>>>>> On 9/19/23 17:16, Danilo Krummrich wrote:
>>>>>>>> On 9/19/23 14:21, Thomas Hellström wrote:
>>>>>>>>> Hi Christian
>>>>>>>>>
>>>>>>>>> On 9/19/23 14:07, Christian König wrote:
>>>>>>>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>>>>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>>>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>>>>>>>> As mentioned in a different mail thread, the reply is 
>>>>>>>>>>>>>>> based on the assumption
>>>>>>>>>>>>>>> that we don't support anything else than GPUVM updates 
>>>>>>>>>>>>>>> from the IOCTL.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think that this assumption is incorrect.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, more precisely I should have said "don't support 
>>>>>>>>>>>>> GPUVM updated from within
>>>>>>>>>>>>> fence signaling critical sections". And looking at the 
>>>>>>>>>>>>> code, that doesn't seem what
>>>>>>>>>>>>> you're doing there.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vulkan is just once specific use case, but this here 
>>>>>>>>>>>>>> should probably be able to handle other use cases as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Especially with HMM you get the requirement that you need 
>>>>>>>>>>>>>> to be able to invalidate GPUVM mappings without grabbing 
>>>>>>>>>>>>>> a reservation lock.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>>>>>>>> should only be called from a ttm_device_funcs::move 
>>>>>>>>>>>>> callback, we should hold the dma-resv
>>>>>>>>>>>>> lock there.
>>>>>>>>>>>>
>>>>>>>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>>>>>>>
>>>>>>>>>>>> In the move callback we only hold the dma-resv lock of the 
>>>>>>>>>>>> BO which is moved, but when that is a shared BO then that's 
>>>>>>>>>>>> not the same as the one for the VM.
>>>>>>>>>>>
>>>>>>>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>>>>>>>> protect drm_gpuvm_bo::evicted
>>>>>>>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted 
>>>>>>>>>>> list once we grabbed all
>>>>>>>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We 
>>>>>>>>>>> can remove them from the evicted
>>>>>>>>>>> list on validate(). This way we never touch the evicted list 
>>>>>>>>>>> without holding at least the VM's
>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>>
>>>>>>>>>>> Do you have any concerns about that?
>>>>>>>>>>
>>>>>>>>>> Scratching my head a bit how that is supposed to work.
>>>>>>>>>>
>>>>>>>>>> This implies that you go over all the evicted BOs during 
>>>>>>>>>> validation and not just the one mentioned in the CS.
>>>>>>>>>>
>>>>>>>>>> That might work for Vulkan, but is pretty much a no-go for 
>>>>>>>>>> OpenGL.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The eviction_lock seems to protect a VM state "evicting" 
>>>>>>>>>>>>> of whether any BO that
>>>>>>>>>>>>> is associated with the VM is currently evicting. At the 
>>>>>>>>>>>>> same time amdgpu protects
>>>>>>>>>>>>> the eviceted list of the VM with a different lock. So this 
>>>>>>>>>>>>> seems to be entirely
>>>>>>>>>>>>> unrelated. Tracking a "currently evicting" state is not 
>>>>>>>>>>>>> part of the GPUVM
>>>>>>>>>>>>> implementation currently and hence nothing would change 
>>>>>>>>>>>>> for amdgpu there.
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for the confusion we use different terminology in 
>>>>>>>>>>>> amdgpu.
>>>>>>>>>>>>
>>>>>>>>>>>> The eviction lock and evicted state is for the VM page 
>>>>>>>>>>>> tables, e.g. if the whole VM is currently not used and 
>>>>>>>>>>>> swapped out or even de-allocated.
>>>>>>>>>>>>
>>>>>>>>>>>> This is necessary because we have cases where we need to 
>>>>>>>>>>>> access the VM data without holding the dma-resv lock of 
>>>>>>>>>>>> this VM. Especially figuring out which parts of an address 
>>>>>>>>>>>> space contain mappings and which doesn't.
>>>>>>>>>>>
>>>>>>>>>>> I think this is fine, this has nothing to do with lists of 
>>>>>>>>>>> evicted GEM objects or external GEM
>>>>>>>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>>>>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>>>>>>>> the VA space does not require any dma-resv locks.
>>>>>>>>>>
>>>>>>>>>> I hope so, but I'm not 100% sure.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is a requirement which comes with HMM handling, you 
>>>>>>>>>>>> won't see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The invalidation lock on the other hand is what in this 
>>>>>>>>>>>> discussion is called eviction lock. This one is needed 
>>>>>>>>>>>> because what I wrote above, during the move callback only 
>>>>>>>>>>>> the dma-resv of the BO which is moved is locked, but not 
>>>>>>>>>>>> necessarily the dma-resv of the VM.
>>>>>>>>>>>
>>>>>>>>>>> That's yet another thing, right? This is used to track 
>>>>>>>>>>> whether *any* BO that belongs to the VM is
>>>>>>>>>>> currently being evicted, correct? As mentioned, as by now 
>>>>>>>>>>> this is not supported in GPUVM and hence
>>>>>>>>>>> would be the same driver specific code with the same driver 
>>>>>>>>>>> specifc lock.
>>>>>>>>>>
>>>>>>>>>> That is most likely a show stopper using this for OpenGL 
>>>>>>>>>> based workloads as far as I can see. For those you need to 
>>>>>>>>>> able to figure out which non-VM BOs have been evicted and 
>>>>>>>>>> which parts of the VM needs updates.
>>>>>>>>>
>>>>>>>>> We identify those with a bool in the gpuvm_bo, and that bool 
>>>>>>>>> is protected by the bo_resv. In essence, the "evicted" list 
>>>>>>>>> must be made up-to-date with all relevant locks held before 
>>>>>>>>> traversing in the next exec.
>>>>>>>>
>>>>>>>> What I still miss with this idea is how do we find all the 
>>>>>>>> drm_gpuvm_bo structures with the evicted bool set to true? When 
>>>>>>>> doing the drm_exec dance we come across all external ones and 
>>>>>>>> can add them to the list if needed, but what about the BOs 
>>>>>>>> having the VM's dma-resv?
>>>>>>>
>>>>>>> Oh, they can be added to the evict list directly (no bool 
>>>>>>> needed) in the eviction code, like in v3. Since for those we 
>>>>>>> indeed hold the VM's dma_resv since it's aliased with the 
>>>>>>> object's dma-resv.
>>>>>>
>>>>>> Yeah, I wanted to note what Danilo seems to think about as well. 
>>>>>> How do we figure out the non-VM BOs evicted?
>>>>>>
>>>>>> We can't walk over the list of all non-VM BOs on every 
>>>>>> submission, that's to much overhead for cases with lots of non-VM 
>>>>>> BOs.
>>>>>>
>>>>>> And we can't rely on userspace sending all non-VM BOs as used 
>>>>>> list down to the kernel with each submission.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>
>>>>> No, that's not needed: Mechanism below.
>>>>>
>>>>> 1) We maintain an evicted list. Typically protected by the vm resv.
>>>>> 2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.
>>>>>
>>>>> a) Evicting a vm bo: The vm resv is held by the eviction code. 
>>>>> Just put it on the evicted list.
>>>>> b) Evicting a shared/external bo: The bo resv is held by the 
>>>>> eviction code. Set the "evicted" bool
>>>>> c) Validating the evicted list on exec:
>>>>
>>>>
>>>>> Loop through all *external/shared* bos.
>>>>
>>>> And this is what you can't do. For Vulkan it probably doesn't 
>>>> matter, but for OpenGL and especially multimedia we have much more 
>>>> BOs on the shared list than what's allocated for the VM.
>>>
>>> But you need to lock- and fence all those so you need to loop 
>>> through them anyway, so we're still O(n_shared)? Or is there some 
>>> clever optimization in amdgpu?
>>
>> Why should I lock and fence them? Only the BOs in the relocation list 
>> are locked and fenced.
>
> Do you by "relocation" list refer to what gpuvm calls "evict" list or 
> something else? Like the relocaton/validation list that used to be 
> sent from user-space for non-VM_BIND vms?

The BOs send into the kernel with each command submission on the classic 
IOCTLs.

>
> The vm bos plus the external/shared bos bound to the VM (the external 
> list) are the bos being referenced by the current batch. So the bos on 
> the VM's external list are the ones being locked and fenced and 
> checked for eviction. If they weren't they could be evicted before the 
> current batch completes?

That only applies to a certain use case, e.g. Vulkan or user mode queues.

Multimedia APIs and especially OpenGL work differently, here only the 
BOs mentioned in the relocation list are guaranteed to not be evicted.

This is intentional because those APIs tend to over allocate memory all 
the time, so for good performance you need to be able to evict BOs from 
the VM while other parts of the VM are currently in use.

Without that especially OpenGL performance would be completely crippled 
at least on amdgpu.

Regards,
Christian.

>
> Thanks,
>
> Thomas
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> I think with some UMDs, xe might end up with similar large lists...
>>>
>>> /Thomas
>>>
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Lock them. After locking, check the "evicted" bool, if it's true. 
>>>>> put the bo on the evicted list (we hold the VM resv at this point) 
>>>>> and clear the "evicted" bool. Note that other vms will have their 
>>>>> own gpuvm_bo which is marked evicted.
>>>>>
>>>>> I have this coded up in a patch for Xe and it seems to be working 
>>>>> properly.
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you mean that we need to unbind all vmas of all vms of 
>>>>>>>>> evicted bos before evicting, We don't do that, at least not in 
>>>>>>>>> Xe, since evicting we wait for VM idle, and it cant access 
>>>>>>>>> anything through the stale vmas until they have been 
>>>>>>>>> revalidated and rebound.
>>>>>>>>>
>>>>>>>>> /Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas 
>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas 
>>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas 
>>>>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common 
>>>>>>>>>>>>>>>>>>>>> infrastructure to
>>>>>>>>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>>>>>>>>> allocations and mappings, generically connect GPU 
>>>>>>>>>>>>>>>>>>>>> VA mappings
>>>>>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>>>>>> backing buffers and perform more complex mapping 
>>>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> However, there are more design patterns commonly 
>>>>>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>>>>>>>>> can potentially be generalized in order to make 
>>>>>>>>>>>>>>>>>>>>> the DRM GPUVA
>>>>>>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this 
>>>>>>>>>>>>>>>>>>>>> context,
>>>>>>>>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not 
>>>>>>>>>>>>>>>>>>>>> being used
>>>>>>>>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM 
>>>>>>>>>>>>>>>>>>>>> objects
>>>>>>>>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM 
>>>>>>>>>>>>>>>>>>>>> objects dma-
>>>>>>>>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the GPU-VM
>>>>>>>>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>>>>>>>>       of, such that validation of evicted GEM 
>>>>>>>>>>>>>>>>>>>>> objects is
>>>>>>>>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 5) Provide some convinience functions for common 
>>>>>>>>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Rather than being designed as a "framework", the 
>>>>>>>>>>>>>>>>>>>>> target is to
>>>>>>>>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>>>>>>>>> features appear as a collection of optional helper 
>>>>>>>>>>>>>>>>>>>>> functions,
>>>>>>>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA 
>>>>>>>>>>>>>>>>>>>>> managers basic
>>>>>>>>>>>>>>>>>>>>> functionality and opt-in for other features 
>>>>>>>>>>>>>>>>>>>>> without setting
>>>>>>>>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>>>>>>>>> flags, just by making use of the corresponding 
>>>>>>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to 
>>>>>>>>>>>>>>>>>>>>> figure out
>>>>>>>>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>>>>>>>>> updating the GPU VA space within the fence 
>>>>>>>>>>>>>>>>>>>>> signalling path.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Suggested-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>>>>>>> include/drm/drm_gpuvm.h | 197 ++++++++++++++
>>>>>>>>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for 
>>>>>>>>>>>>>>>>>>>>> an existing
>>>>>>>>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>>>>>>>>      * particular combination. If not existent a 
>>>>>>>>>>>>>>>>>>>>> new instance
>>>>>>>>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a 
>>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of 
>>>>>>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>>>>>>>>> + * list are maintained in order to accelerate 
>>>>>>>>>>>>>>>>>>>>> locking of
>>>>>>>>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>>>>>>>>> + * validation of evicted objects bound in a 
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm. For
>>>>>>>>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given 
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm can be
>>>>>>>>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers can 
>>>>>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>>>>>>>>> + * order to validate all evicted 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects. It is
>>>>>>>>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code the 
>>>>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as 
>>>>>>>>>>>>>>>>>>>>> external object
>>>>>>>>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's 
>>>>>>>>>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() 
>>>>>>>>>>>>>>>>>>>>> for the same
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe 
>>>>>>>>>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep 
>>>>>>>>>>>>>>>>>>>>> instances unique.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of 
>>>>>>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>>>>>>>>> + * protected against concurrent insertion / 
>>>>>>>>>>>>>>>>>>>>> removal and
>>>>>>>>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * However, drivers still need ensure to protect 
>>>>>>>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>>>>>>>>> + * iterating those lists, such as 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() and
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such 
>>>>>>>>>>>>>>>>>>>>> function contains
>>>>>>>>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Functions adding or removing entries from 
>>>>>>>>>>>>>>>>>>>>> those lists,
>>>>>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>>>>>>>>> + * (safely) modified while potentially being 
>>>>>>>>>>>>>>>>>>>>> iternated by
>>>>>>>>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next 
>>>>>>>>>>>>>>>>>>>>> vm_bo element
>>>>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list 
>>>>>>>>>>>>>>>>>>>>> used to store
>>>>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>> Are the list spinlocks needed for that async state 
>>>>>>>>>>>>>>>>>>>> update from
>>>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>>>> dma-fence critical section we've discussed previously?
>>>>>>>>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Otherwise it should be sufficient to protect the 
>>>>>>>>>>>>>>>>>>>> lists with the
>>>>>>>>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If those spinlocks are still needed in some 
>>>>>>>>>>>>>>>>>>>> situations, perhaps
>>>>>>>>>>>>>>>>>>>> could we
>>>>>>>>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the 
>>>>>>>>>>>>>>>>>>>> maple tree
>>>>>>>>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>>>>>>>>> holding only the dma-resv lock from the BO this 
>>>>>>>>>>>>>>>>>>> function gets
>>>>>>>>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>>>>>>>>> the spinlock protects concurrent 
>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_evict() calls with
>>>>>>>>>>>>>>>>>>> different BOs.
>>>>>>>>>>>>>>>>>> No. Only if you try to add external objects to the 
>>>>>>>>>>>>>>>>>> vm's evict list
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> within the evict code. That's not necessary since you 
>>>>>>>>>>>>>>>>>> loop through
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> external objects anyway when locking them so an 
>>>>>>>>>>>>>>>>>> "evicted" bool in
>>>>>>>>>>>>>>>>>> the vm_bo,
>>>>>>>>>>>>>>>>>> protected by the bo resv would be sufficient. The 
>>>>>>>>>>>>>>>>>> extobj locking
>>>>>>>>>>>>>>>>>> loop can
>>>>>>>>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>>>>>>>>> And validate() can remove it while still holding all 
>>>>>>>>>>>>>>>>> dma-resv locks,
>>>>>>>>>>>>>>>>> neat!
>>>>>>>>>>>>>>>>> However, what if two tasks are trying to lock the VA 
>>>>>>>>>>>>>>>>> space
>>>>>>>>>>>>>>>>> concurrently? What
>>>>>>>>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to 
>>>>>>>>>>>>>>>>> zero in
>>>>>>>>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>>>>>>>>> Are we guaranteed that at this point of time the 
>>>>>>>>>>>>>>>>> drm_gpuvm_bo is not
>>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>>>>> might drop the last reference to the drm_gem_object 
>>>>>>>>>>>>>>>>> and hence we'd
>>>>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>>>>> free the dma-resv lock while holding it, at least if 
>>>>>>>>>>>>>>>>> it's an external
>>>>>>>>>>>>>>>>> object.
>>>>>>>>>>>>>>>> Easiest way in this scheme is to think of the lists as 
>>>>>>>>>>>>>>>> being protected
>>>>>>>>>>>>>>>> by the vm's resv lock. That means anybody calling 
>>>>>>>>>>>>>>>> unlink() must also
>>>>>>>>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF point 
>>>>>>>>>>>>>>>> of view, but
>>>>>>>>>>>>>>>> perhaps not from a locking inversion POW from an async 
>>>>>>>>>>>>>>>> list update).
>>>>>>>>>>>>>>> This would mean that on unlink() we'd need to hold the 
>>>>>>>>>>>>>>> VM's resv lock and the
>>>>>>>>>>>>>>> corresponding GEM's resv lock (in case they're not the 
>>>>>>>>>>>>>>> same anyways) because the
>>>>>>>>>>>>>>> VM's resv lock would protect the external / evicted 
>>>>>>>>>>>>>>> object lists and the GEM
>>>>>>>>>>>>>>> objects resv lock protects the GEM's list of 
>>>>>>>>>>>>>>> drm_gpuvm_bos and the
>>>>>>>>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For extobjs an outer lock would be enough in case of 
>>>>>>>>>>>>>>>>>>> Xe, but I
>>>>>>>>>>>>>>>>>>> really would not
>>>>>>>>>>>>>>>>>>> like to add even more complexity just to get the 
>>>>>>>>>>>>>>>>>>> spinlock out of
>>>>>>>>>>>>>>>>>>> the way in case
>>>>>>>>>>>>>>>>>>> the driver already has an outer lock protecting this 
>>>>>>>>>>>>>>>>>>> path.
>>>>>>>>>>>>>>>>>> I must disagree here. These spinlocks and atomic 
>>>>>>>>>>>>>>>>>> operations are
>>>>>>>>>>>>>>>>>> pretty
>>>>>>>>>>>>>>>>>> costly and as discussed earlier this type of locking 
>>>>>>>>>>>>>>>>>> was the reason
>>>>>>>>>>>>>>>>>> (at
>>>>>>>>>>>>>>>>>> least according to the commit message) that made 
>>>>>>>>>>>>>>>>>> Christian drop the
>>>>>>>>>>>>>>>>>> XArray
>>>>>>>>>>>>>>>>>> use in drm_exec for the same set of objects: "The 
>>>>>>>>>>>>>>>>>> locking overhead
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the 
>>>>>>>>>>>>>>>>>> added
>>>>>>>>>>>>>>>>>> complexity and a
>>>>>>>>>>>>>>>>>> single wide lock following the drm locking guidelines 
>>>>>>>>>>>>>>>>>> set out by
>>>>>>>>>>>>>>>>>> Daniel and
>>>>>>>>>>>>>>>>>> David should really be the default choice with an 
>>>>>>>>>>>>>>>>>> opt-in for a
>>>>>>>>>>>>>>>>>> spinlock if
>>>>>>>>>>>>>>>>>> needed for async and pushing out to a wq is not an 
>>>>>>>>>>>>>>>>>> option.
>>>>>>>>>>>>>>>>> For the external object list an outer lock would work 
>>>>>>>>>>>>>>>>> as long as it's
>>>>>>>>>>>>>>>>> not the
>>>>>>>>>>>>>>>>> dma-resv lock of the corresponding GEM object, since 
>>>>>>>>>>>>>>>>> here we actually
>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>>>>>>>>> It's just a bit weird design wise that drivers would 
>>>>>>>>>>>>>>>>> need to take
>>>>>>>>>>>>>>>>> this outer
>>>>>>>>>>>>>>>>> lock on:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to call
>>>>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Given that it seems reasonable to do all the required 
>>>>>>>>>>>>>>>>> locking
>>>>>>>>>>>>>>>>> internally.
>>>>>>>>>>>>>>>>  From a design POW, there has been a clear direction in 
>>>>>>>>>>>>>>>> XE to make
>>>>>>>>>>>>>>>> things similar to mmap() / munmap(), so this outer 
>>>>>>>>>>>>>>>> lock, which in Xe is
>>>>>>>>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. 
>>>>>>>>>>>>>>>> It's protecting
>>>>>>>>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>>>>>>>>> structures and
>>>>>>>>>>>>>>>> the extobj list. Basically it's taken early in the exec 
>>>>>>>>>>>>>>>> IOCTL, the
>>>>>>>>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the 
>>>>>>>>>>>>>>>> pagefault handler, so
>>>>>>>>>>>>>>>> all of the above are just asserting that it is taken in 
>>>>>>>>>>>>>>>> the correct
>>>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But strictly with this scheme one could also use the 
>>>>>>>>>>>>>>>> vm's dma_resv for
>>>>>>>>>>>>>>>> the extobj list since with drm_exec, it's locked before 
>>>>>>>>>>>>>>>> traversing the
>>>>>>>>>>>>>>>> list.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The whole point of this scheme is to rely on locks that 
>>>>>>>>>>>>>>>> you already are
>>>>>>>>>>>>>>>> supposed to be holding for various reasons and is 
>>>>>>>>>>>>>>>> simple to comprehend.
>>>>>>>>>>>>>>> I don't agree that we're supposed to hold the VM's resv 
>>>>>>>>>>>>>>> lock anyways for
>>>>>>>>>>>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), 
>>>>>>>>>>>>>>> but I'm fine using it
>>>>>>>>>>>>>>> for that purpose nevertheless.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In order to at least place lockdep checks, the driver 
>>>>>>>>>>>>>>>>> would need to
>>>>>>>>>>>>>>>>> supply the
>>>>>>>>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM 
>>>>>>>>>>>>>>>>> otherwise doesn't
>>>>>>>>>>>>>>>>> know about
>>>>>>>>>>>>>>>>> the lock.
>>>>>>>>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>>>>>>>>> I'd really like to avoid that, especially now that 
>>>>>>>>>>>>>>> everything got simpler. We
>>>>>>>>>>>>>>> should define the actual locks to take instead.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Out of curiosity, what is the overhead of a 
>>>>>>>>>>>>>>>>> spin_lock() that doesn't
>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>> spin?
>>>>>>>>>>>>>>>> I guess it's hard to tell exactly, but it is much lower 
>>>>>>>>>>>>>>>> on modern x86
>>>>>>>>>>>>>>>> than what it used to be. Not sure about ARM, which is 
>>>>>>>>>>>>>>>> the other
>>>>>>>>>>>>>>>> architecture important to us. I figure if there is 
>>>>>>>>>>>>>>>> little cache-line
>>>>>>>>>>>>>>>> bouncing the main overhead comes from the implied 
>>>>>>>>>>>>>>>> barriers.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> A pretty simple way that would not add much code 
>>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct 
>>>>>>>>>>>>>>>>>> drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>> spinlock_t
>>>>>>>>>>>>>>>>>> *lock)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For such drivers, that would require anybody 
>>>>>>>>>>>>>>>>>>>> calling unlink to
>>>>>>>>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock 
>>>>>>>>>>>>>>>>>>> for the GEMs
>>>>>>>>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>>>>>>>>> VM_BO list to be more precise). We can't just use 
>>>>>>>>>>>>>>>>>>> the dma-resv
>>>>>>>>>>>>>>>>>>> lock for that
>>>>>>>>>>>>>>>>>>> with VM_BO abstractions, because on destruction of a 
>>>>>>>>>>>>>>>>>>> VM_BO we
>>>>>>>>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>>>>>>>>> be allowed to already hold the dma-resv lock. That's 
>>>>>>>>>>>>>>>>>>> the fix I
>>>>>>>>>>>>>>>>>>> was referring to
>>>>>>>>>>>>>>>>>>> earlier.
>>>>>>>>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for the 
>>>>>>>>>>>>>>>>>> GEM's gpuva
>>>>>>>>>>>>>>>>>> list, but
>>>>>>>>>>>>>>>>>> holding the vm's dma-resv lock across the unlink 
>>>>>>>>>>>>>>>>>> shouldn't be a
>>>>>>>>>>>>>>>>>> problem. We
>>>>>>>>>>>>>>>>>> may free the object and a pointer to the vm's resv 
>>>>>>>>>>>>>>>>>> during unlink
>>>>>>>>>>>>>>>>>> but we
>>>>>>>>>>>>>>>>>> don't free the vm's resv.  It'd be a matter of 
>>>>>>>>>>>>>>>>>> ensuring that any
>>>>>>>>>>>>>>>>>> calls to
>>>>>>>>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>>>>>>>>> Drivers calling unlink() from the fence signaling path 
>>>>>>>>>>>>>>>>> can't use the
>>>>>>>>>>>>>>>>> VM's
>>>>>>>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>>>>>>> Yes, that made me a bit curious because in the current 
>>>>>>>>>>>>>>>> version the code
>>>>>>>>>>>>>>>> required the object's dma_resv for unlink() which can't 
>>>>>>>>>>>>>>>> be grabbed
>>>>>>>>>>>>>>>> either from the fence signaling path. So are there any 
>>>>>>>>>>>>>>>> drivers actually
>>>>>>>>>>>>>>>> wanting to do that? If so, they will either need to 
>>>>>>>>>>>>>>>> resort to the
>>>>>>>>>>>>>>>> current spinlock solution or they will need to call 
>>>>>>>>>>>>>>>> unlink from a
>>>>>>>>>>>>>>>> workqueue item.
>>>>>>>>>>>>>>> As Boris already mentioned we have the dma-resv lock by 
>>>>>>>>>>>>>>> default or a driver
>>>>>>>>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid 
>>>>>>>>>>>>>>> of the latter.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Also, what if the object is an external object? We 
>>>>>>>>>>>>>>>>> can't use the VM's
>>>>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>>>>> lock here.
>>>>>>>>>>>>>>>> Why? Typically (sync) unlink is only ever called from 
>>>>>>>>>>>>>>>> an unbind-like
>>>>>>>>>>>>>>>> operation where it should be trivial to grab the vm's 
>>>>>>>>>>>>>>>> resv. Or, for
>>>>>>>>>>>>>>>> that matter any outer lock protecting the extobj list. 
>>>>>>>>>>>>>>>> Rule would be
>>>>>>>>>>>>>>>> the drm_gpuvm_bo::entry::extobj  and 
>>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>>>>>>>>> be protected by either the vm's dma_resv (or possibly 
>>>>>>>>>>>>>>>> an outer lock in
>>>>>>>>>>>>>>>> the case of the extobj list).
>>>>>>>>>>>>>>> Outer lock wouldn't have been working for updates in the 
>>>>>>>>>>>>>>> async path, but
>>>>>>>>>>>>>>> shouldn't be relevant anymore. We could use the VM's 
>>>>>>>>>>>>>>> resv for that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held 
>>>>>>>>>>>>>>>>> when calling
>>>>>>>>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), 
>>>>>>>>>>>>>>>>> which if the
>>>>>>>>>>>>>>>>> refcount drops
>>>>>>>>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>>>>>>>>> drop the
>>>>>>>>>>>>>>>>> last reference of the GEM object.
>>>>>>>>>>>>>>>> Yes, but this is a different problem as to what exactly 
>>>>>>>>>>>>>>>> protects
>>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an 
>>>>>>>>>>>>>>>> internal per bo list
>>>>>>>>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need 
>>>>>>>>>>>>>>>> to ensure that
>>>>>>>>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually 
>>>>>>>>>>>>>>>> refcounts its obj
>>>>>>>>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>>>>>>>>> refcount (I know
>>>>>>>>>>>>>>>> Boris didn't like that, but requiring an explicit 
>>>>>>>>>>>>>>>> refcount for a
>>>>>>>>>>>>>>>> pointer you dereference unless you're under a lock that 
>>>>>>>>>>>>>>>> ensures keeping
>>>>>>>>>>>>>>>> the object alive is pretty much required?) But anyway 
>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or 
>>>>>>>>>>>>>>>> internal spinlock)
>>>>>>>>>>>>>>>> I don't have a strong preference.
>>>>>>>>>>>>>>> We can keep the GEM objects dma-resv lock, however as 
>>>>>>>>>>>>>>> mentioned above
>>>>>>>>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires 
>>>>>>>>>>>>>>> both the VM's resv lock
>>>>>>>>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>>>>>>>>> I don't think these are real problems.
>>>>>>>>>>>>>>>> With the excepton of the eviction list "trick" where we 
>>>>>>>>>>>>>>>> currently have
>>>>>>>>>>>>>>>> slightly different approach to collect external bos 
>>>>>>>>>>>>>>>> needing rebinding,
>>>>>>>>>>>>>>>> we have this working fine.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> TBH I think pretty much the only situation where the 
>>>>>>>>>>>>>>>> spinlock is needed
>>>>>>>>>>>>>>>> is for async updates of these lists, unless a wq item 
>>>>>>>>>>>>>>>> can be used for
>>>>>>>>>>>>>>>> that, but it doesn't really seem like the current code 
>>>>>>>>>>>>>>>> allows for such
>>>>>>>>>>>>>>>> updates anyway? It complicates the code a lot, adds 
>>>>>>>>>>>>>>>> overhead and also
>>>>>>>>>>>>>>>> adds the requirement for refcounting during list 
>>>>>>>>>>>>>>>> traversal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It seems that with that also the refcount could be 
>>>>>>>>>>>>>>>>>>>> make non-
>>>>>>>>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> All in the spirit of the drm locking guidelines 
>>>>>>>>>>>>>>>>>>>> "use big locks
>>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>> possible".
>>>>>>>>>>>>>>>>>>>> Lower level locks only when necessary for 
>>>>>>>>>>>>>>>>>>>> performance or
>>>>>>>>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Elements popped from the original list are 
>>>>>>>>>>>>>>>>>>>>> kept in a
>>>>>>>>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, 
>>>>>>>>>>>>>>>>>>>>> __list_name,
>>>>>>>>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>> *__vm_bo;                                           \
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \ 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>>>>>>>>> {                                                        \ 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo list 
>>>>>>>>>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>>>>>>>>> + *             ret = do_something_with_vm_bo(..., 
>>>>>>>>>>>>>>>>>>>>> vm_bo);
>>>>>>>>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, 
>>>>>>>>>>>>>>>>>>>>> <list_name>,
>>>>>>>>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Only used for internal list iterations, not 
>>>>>>>>>>>>>>>>>>>>> meant to be
>>>>>>>>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>>>>>>>>> +       for (__vm_bo = 
>>>>>>>>>>>>>>>>>>>>> get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>>>>>>>>> +            __vm_bo = 
>>>>>>>>>>>>>>>>>>>>> get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements 
>>>>>>>>>>>>>>>>>>>>> back to their
>>>>>>>>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list 
>>>>>>>>>>>>>>>>>>>>> used to store
>>>>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we 
>>>>>>>>>>>>>>>>>>>>> should call
>>>>>>>>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>>>>>>>>> + * to restore the original state and let new 
>>>>>>>>>>>>>>>>>>>>> iterations take
>>>>>>>>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>>>>>>>>> +               /* Merge back the two lists, 
>>>>>>>>>>>>>>>>>>>>> moving local
>>>>>>>>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>>>>>>>>> +                * head to preserve previous 
>>>>>>>>>>>>>>>>>>>>> ordering, in
>>>>>>>>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>>>>> +               list_splice(__local_list, &(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into 
>>>>>>>>>>>>>>>>>>>>> the given
>>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list 
>>>>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>>>>>>>>> __list_name)      ��                     \
>>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from 
>>>>>>>>>>>>>>>>>>>>> the given
>>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert into
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list 
>>>>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>     #define 
>>>>>>>>>>>>>>>>>>>>> to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, 
>>>>>>>>>>>>>>>>>>>>> range);
>>>>>>>>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root), 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>               "GPUVA tree is not empty, 
>>>>>>>>>>>>>>>>>>>>> potentially leaking
>>>>>>>>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), 
>>>>>>>>>>>>>>>>>>>>> "Extobj list
>>>>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), 
>>>>>>>>>>>>>>>>>>>>> "Evict list
>>>>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all 
>>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Note: This function is safe against concurrent 
>>>>>>>>>>>>>>>>>>>>> insertion
>>>>>>>>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Drivers need to make sure to protect this case 
>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before 
>>>>>>>>>>>>>>>>>>>>> this function
>>>>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that 
>>>>>>>>>>>>>>>>>>>>> the GPUVM's
>>>>>>>>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, 
>>>>>>>>>>>>>>>>>>>>> &extobjs,
>>>>>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>>>> vm_bo->obj,
>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the 
>>>>>>>>>>>>>>>>>>>>> loop. */
>>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>>>>> mapped within
>>>>>>>>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>> +                       u64 addr, u64 range, 
>>>>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, 
>>>>>>>>>>>>>>>>>>>>> addr, end) {
>>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object *obj = 
>>>>>>>>>>>>>>>>>>>>> va->gem.obj;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, obj,
>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Addionally, when calling this function with 
>>>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>>>>>>>>> + * being set the driver receives the given @fn 
>>>>>>>>>>>>>>>>>>>>> callback to
>>>>>>>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>>>>>>>> + * dma-resv in the context of the &drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within 
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, 
>>>>>>>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +               ret = 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>>>>>>>>> +                       ret = 
>>>>>>>>>>>>>>>>>>>>> vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec); 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, 
>>>>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       return 
>>>>>>>>>>>>>>>>>>>>> drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all 
>>>>>>>>>>>>>>>>>>>>> dma-resv of all
>>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>>>>>>>>> + * @num_objs: the number of additional 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects to
>>>>>>>>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> + * contains mappings of, plus the ones given 
>>>>>>>>>>>>>>>>>>>>> through @objs.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>> + bool interruptible)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, 
>>>>>>>>>>>>>>>>>>>>> num_fences,
>>>>>>>>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>>>>> mapped
>>>>>>>>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>> + bool interruptible)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>>>>> +               ret = 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs marked 
>>>>>>>>>>>>>>>>>>>>> as evicted
>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback 
>>>>>>>>>>>>>>>>>>>>> for all
>>>>>>>>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = gpuvm->ops;
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, 
>>>>>>>>>>>>>>>>>>>>> &evict, vm_bo) {
>>>>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv); 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the 
>>>>>>>>>>>>>>>>>>>>> loop. */
>>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to 
>>>>>>>>>>>>>>>>>>>>> private and all
>>>>>>>>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage private_usage,
>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage extobj_usage)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, 
>>>>>>>>>>>>>>>>>>>>> index, obj) {
>>>>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new 
>>>>>>>>>>>>>>>>>>>>> instance of struct
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the 
>>>>>>>>>>>>>>>>>>>>> reference of
>>>>>>>>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * If the reference count drops to zero, the 
>>>>>>>>>>>>>>>>>>>>> &gpuvm_bo is
>>>>>>>>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva list. 
>>>>>>>>>>>>>>>>>>>>> Hence, if
>>>>>>>>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>>>>>>>>> + * function can potentially let the reference 
>>>>>>>>>>>>>>>>>>>>> count to zero
>>>>>>>>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM gpuva 
>>>>>>>>>>>>>>>>>>>>> lock.
>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>>>> __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>> struct drm_gem_object *obj)
>>>>>>>>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the 
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bo to its
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its 
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm's the
>>>>>>>>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's 
>>>>>>>>>>>>>>>>>>>>> extobj list if
>>>>>>>>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>>>>>>>>> + * already and if the corresponding 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, extobj);
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_object to
>>>>>>>>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>>>>>>>>> + * list containing a mapping of this 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, 
>>>>>>>>>>>>>>>>>>>>> bool evict)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, 
>>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>>>>>>>> __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>                     struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj list
>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>>>>> extobj list
>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>> +        * @evict: structure holding the evict 
>>>>>>>>>>>>>>>>>>>>> list and evict
>>>>>>>>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>>>>>>>>                      const struct drm_gpuvm_ops 
>>>>>>>>>>>>>>>>>>>>> *ops);
>>>>>>>>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the 
>>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object &dma_resv 
>>>>>>>>>>>>>>>>>>>>> differs
>>>>>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>>>>>>>> __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct 
>>>>>>>>>>>>>>>>>>>>> drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, 
>>>>>>>>>>>>>>>>>>>>> next__, gpuvm__)
>>>>>>>>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, 
>>>>>>>>>>>>>>>>>>>>> &(gpuvm__)-
>>>>>>>>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm abstraction of
>>>>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to lock 
>>>>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA 
>>>>>>>>>>>>>>>>>>>>> reservations
>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>> +        * @extra: Callback and corresponding 
>>>>>>>>>>>>>>>>>>>>> private data
>>>>>>>>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>>>>>>>>> +        * lock arbitrary additional 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>> +                * @priv: driver private data for 
>>>>>>>>>>>>>>>>>>>>> the @fn
>>>>>>>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs 
>>>>>>>>>>>>>>>>>>>>> common dma-
>>>>>>>>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to reserve
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs dummy
>>>>>>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>>>> &gpuvm->d_obj,
>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all 
>>>>>>>>>>>>>>>>>>>>> assoiciated
>>>>>>>>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Releases all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>> *vm_exec)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure representing a
>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>>>>>>>> * gpuva list.
>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>>>>>>>>> +                        * extobj list.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>>>>> +                        * @evict: List entry to 
>>>>>>>>>>>>>>>>>>>>> attach to
>>>>>>>>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>>>>>>>>> +                        * list.
>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>                    struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object 
>>>>>>>>>>>>>>>>>>>>> *obj, bool
>>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to 
>>>>>>>>>>>>>>>>>>>>> walk over a
>>>>>>>>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to in 
>>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op 
>>>>>>>>>>>>>>>>>>>>> *op, void
>>>>>>>>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>> +        * @bo_validate: called from 
>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate()
>>>>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>>>>> +        * Drivers receive this callback for every 
>>>>>>>>>>>>>>>>>>>>> evicted
>>>>>>>>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>>>>> +        * Typically, drivers would call their driver
>>>>>>>>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this 
>>>>>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object 
>>>>>>>>>>>>>>>>>>>>> *obj);
>>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>>> void *priv,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
Thomas Hellstrom Sept. 20, 2023, 2:02 p.m. UTC | #57
Hi

On 9/20/23 15:48, Christian König wrote:
> Am 20.09.23 um 15:38 schrieb Thomas Hellström:
>>
>> On 9/20/23 15:06, Christian König wrote:
>>>
>>>
>>> Am 20.09.23 um 14:06 schrieb Thomas Hellström:
>>>>
>>>> On 9/20/23 12:51, Christian König wrote:
>>>>> Am 20.09.23 um 09:44 schrieb Thomas Hellström:
>>>>>> Hi,
>>>>>>
>>>>>> On 9/20/23 07:37, Christian König wrote:
>>>>>>> Am 19.09.23 um 17:23 schrieb Thomas Hellström:
>>>>>>>>
>>>>>>>> On 9/19/23 17:16, Danilo Krummrich wrote:
>>>>>>>>> On 9/19/23 14:21, Thomas Hellström wrote:
>>>>>>>>>> Hi Christian
>>>>>>>>>>
>>>>>>>>>> On 9/19/23 14:07, Christian König wrote:
>>>>>>>>>>> Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
>>>>>>>>>>>> On 9/13/23 17:33, Christian König wrote:
>>>>>>>>>>>>> Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
>>>>>>>>>>>>>> On 9/13/23 16:26, Christian König wrote:
>>>>>>>>>>>>>>> Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
>>>>>>>>>>>>>>>> As mentioned in a different mail thread, the reply is 
>>>>>>>>>>>>>>>> based on the assumption
>>>>>>>>>>>>>>>> that we don't support anything else than GPUVM updates 
>>>>>>>>>>>>>>>> from the IOCTL.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think that this assumption is incorrect.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, more precisely I should have said "don't support 
>>>>>>>>>>>>>> GPUVM updated from within
>>>>>>>>>>>>>> fence signaling critical sections". And looking at the 
>>>>>>>>>>>>>> code, that doesn't seem what
>>>>>>>>>>>>>> you're doing there.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vulkan is just once specific use case, but this here 
>>>>>>>>>>>>>>> should probably be able to handle other use cases as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Especially with HMM you get the requirement that you 
>>>>>>>>>>>>>>> need to be able to invalidate GPUVM mappings without 
>>>>>>>>>>>>>>> grabbing a reservation lock.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you mean with "invalidate GPUVM mappings" in this 
>>>>>>>>>>>>>> context? drm_gpuvm_bo_evict()
>>>>>>>>>>>>>> should only be called from a ttm_device_funcs::move 
>>>>>>>>>>>>>> callback, we should hold the dma-resv
>>>>>>>>>>>>>> lock there.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well the question is which dma-resv lock do we hold?
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the move callback we only hold the dma-resv lock of the 
>>>>>>>>>>>>> BO which is moved, but when that is a shared BO then 
>>>>>>>>>>>>> that's not the same as the one for the VM.
>>>>>>>>>>>>
>>>>>>>>>>>> Correct, Thomas' idea was to use the GEM's dma_resv lock to 
>>>>>>>>>>>> protect drm_gpuvm_bo::evicted
>>>>>>>>>>>> and then actually move the drm_gpuvm_bo to the VM's evicted 
>>>>>>>>>>>> list once we grabbed all
>>>>>>>>>>>> dma-resv locks when locking the VM's BOs using drm_exec. We 
>>>>>>>>>>>> can remove them from the evicted
>>>>>>>>>>>> list on validate(). This way we never touch the evicted 
>>>>>>>>>>>> list without holding at least the VM's
>>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have any concerns about that?
>>>>>>>>>>>
>>>>>>>>>>> Scratching my head a bit how that is supposed to work.
>>>>>>>>>>>
>>>>>>>>>>> This implies that you go over all the evicted BOs during 
>>>>>>>>>>> validation and not just the one mentioned in the CS.
>>>>>>>>>>>
>>>>>>>>>>> That might work for Vulkan, but is pretty much a no-go for 
>>>>>>>>>>> OpenGL.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> See what the eviction lock in amdgpu is doing for example.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The eviction_lock seems to protect a VM state "evicting" 
>>>>>>>>>>>>>> of whether any BO that
>>>>>>>>>>>>>> is associated with the VM is currently evicting. At the 
>>>>>>>>>>>>>> same time amdgpu protects
>>>>>>>>>>>>>> the eviceted list of the VM with a different lock. So 
>>>>>>>>>>>>>> this seems to be entirely
>>>>>>>>>>>>>> unrelated. Tracking a "currently evicting" state is not 
>>>>>>>>>>>>>> part of the GPUVM
>>>>>>>>>>>>>> implementation currently and hence nothing would change 
>>>>>>>>>>>>>> for amdgpu there.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for the confusion we use different terminology in 
>>>>>>>>>>>>> amdgpu.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The eviction lock and evicted state is for the VM page 
>>>>>>>>>>>>> tables, e.g. if the whole VM is currently not used and 
>>>>>>>>>>>>> swapped out or even de-allocated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is necessary because we have cases where we need to 
>>>>>>>>>>>>> access the VM data without holding the dma-resv lock of 
>>>>>>>>>>>>> this VM. Especially figuring out which parts of an address 
>>>>>>>>>>>>> space contain mappings and which doesn't.
>>>>>>>>>>>>
>>>>>>>>>>>> I think this is fine, this has nothing to do with lists of 
>>>>>>>>>>>> evicted GEM objects or external GEM
>>>>>>>>>>>> objects, right? Marking mappings (drm_gpuva) as invalidated 
>>>>>>>>>>>> (DRM_GPUVA_INVALIDATED) or accessing
>>>>>>>>>>>> the VA space does not require any dma-resv locks.
>>>>>>>>>>>
>>>>>>>>>>> I hope so, but I'm not 100% sure.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a requirement which comes with HMM handling, you 
>>>>>>>>>>>>> won't see this with Vulkan (or OpenGL, VAAPI etc..).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The invalidation lock on the other hand is what in this 
>>>>>>>>>>>>> discussion is called eviction lock. This one is needed 
>>>>>>>>>>>>> because what I wrote above, during the move callback only 
>>>>>>>>>>>>> the dma-resv of the BO which is moved is locked, but not 
>>>>>>>>>>>>> necessarily the dma-resv of the VM.
>>>>>>>>>>>>
>>>>>>>>>>>> That's yet another thing, right? This is used to track 
>>>>>>>>>>>> whether *any* BO that belongs to the VM is
>>>>>>>>>>>> currently being evicted, correct? As mentioned, as by now 
>>>>>>>>>>>> this is not supported in GPUVM and hence
>>>>>>>>>>>> would be the same driver specific code with the same driver 
>>>>>>>>>>>> specifc lock.
>>>>>>>>>>>
>>>>>>>>>>> That is most likely a show stopper using this for OpenGL 
>>>>>>>>>>> based workloads as far as I can see. For those you need to 
>>>>>>>>>>> able to figure out which non-VM BOs have been evicted and 
>>>>>>>>>>> which parts of the VM needs updates.
>>>>>>>>>>
>>>>>>>>>> We identify those with a bool in the gpuvm_bo, and that bool 
>>>>>>>>>> is protected by the bo_resv. In essence, the "evicted" list 
>>>>>>>>>> must be made up-to-date with all relevant locks held before 
>>>>>>>>>> traversing in the next exec.
>>>>>>>>>
>>>>>>>>> What I still miss with this idea is how do we find all the 
>>>>>>>>> drm_gpuvm_bo structures with the evicted bool set to true? 
>>>>>>>>> When doing the drm_exec dance we come across all external ones 
>>>>>>>>> and can add them to the list if needed, but what about the BOs 
>>>>>>>>> having the VM's dma-resv?
>>>>>>>>
>>>>>>>> Oh, they can be added to the evict list directly (no bool 
>>>>>>>> needed) in the eviction code, like in v3. Since for those we 
>>>>>>>> indeed hold the VM's dma_resv since it's aliased with the 
>>>>>>>> object's dma-resv.
>>>>>>>
>>>>>>> Yeah, I wanted to note what Danilo seems to think about as well. 
>>>>>>> How do we figure out the non-VM BOs evicted?
>>>>>>>
>>>>>>> We can't walk over the list of all non-VM BOs on every 
>>>>>>> submission, that's to much overhead for cases with lots of 
>>>>>>> non-VM BOs.
>>>>>>>
>>>>>>> And we can't rely on userspace sending all non-VM BOs as used 
>>>>>>> list down to the kernel with each submission.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>
>>>>>> No, that's not needed: Mechanism below.
>>>>>>
>>>>>> 1) We maintain an evicted list. Typically protected by the vm resv.
>>>>>> 2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.
>>>>>>
>>>>>> a) Evicting a vm bo: The vm resv is held by the eviction code. 
>>>>>> Just put it on the evicted list.
>>>>>> b) Evicting a shared/external bo: The bo resv is held by the 
>>>>>> eviction code. Set the "evicted" bool
>>>>>> c) Validating the evicted list on exec:
>>>>>
>>>>>
>>>>>> Loop through all *external/shared* bos.
>>>>>
>>>>> And this is what you can't do. For Vulkan it probably doesn't 
>>>>> matter, but for OpenGL and especially multimedia we have much more 
>>>>> BOs on the shared list than what's allocated for the VM.
>>>>
>>>> But you need to lock- and fence all those so you need to loop 
>>>> through them anyway, so we're still O(n_shared)? Or is there some 
>>>> clever optimization in amdgpu?
>>>
>>> Why should I lock and fence them? Only the BOs in the relocation 
>>> list are locked and fenced.
>>
>> Do you by "relocation" list refer to what gpuvm calls "evict" list or 
>> something else? Like the relocaton/validation list that used to be 
>> sent from user-space for non-VM_BIND vms?
>
> The BOs send into the kernel with each command submission on the 
> classic IOCTLs.
>
>>
>> The vm bos plus the external/shared bos bound to the VM (the external 
>> list) are the bos being referenced by the current batch. So the bos 
>> on the VM's external list are the ones being locked and fenced and 
>> checked for eviction. If they weren't they could be evicted before 
>> the current batch completes?
>
> That only applies to a certain use case, e.g. Vulkan or user mode queues.
>
> Multimedia APIs and especially OpenGL work differently, here only the 
> BOs mentioned in the relocation list are guaranteed to not be evicted.
>
> This is intentional because those APIs tend to over allocate memory 
> all the time, so for good performance you need to be able to evict BOs 
> from the VM while other parts of the VM are currently in use.
>
> Without that especially OpenGL performance would be completely 
> crippled at least on amdgpu.

OK, I've always wondered how overcommiting a local VM would be handled 
on VM_BIND, where we don't have the relocation list, at least not in xe, 
so we have what you refer to as the user mode queues.

I figure those APIs that suffer from overcommitting would maintain a 
"current working set" in user-space and send changes as deltas to the 
kernel as unbinds/binds. Or at least "can be unbound / can no longer be 
unbound" advises.

This may turn out interesting.

/Thomas




>
>
> Regards,
> Christian.
>
>>
>> Thanks,
>>
>> Thomas
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> I think with some UMDs, xe might end up with similar large lists...
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Lock them. After locking, check the "evicted" bool, if it's true. 
>>>>>> put the bo on the evicted list (we hold the VM resv at this 
>>>>>> point) and clear the "evicted" bool. Note that other vms will 
>>>>>> have their own gpuvm_bo which is marked evicted.
>>>>>>
>>>>>> I have this coded up in a patch for Xe and it seems to be working 
>>>>>> properly.
>>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> /Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If you mean that we need to unbind all vmas of all vms of 
>>>>>>>>>> evicted bos before evicting, We don't do that, at least not 
>>>>>>>>>> in Xe, since evicting we wait for VM idle, and it cant access 
>>>>>>>>>> anything through the stale vmas until they have been 
>>>>>>>>>> revalidated and rebound.
>>>>>>>>>>
>>>>>>>>>> /Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas 
>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas 
>>>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>>>> On 9/12/23 18:50, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>>>>> On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas 
>>>>>>>>>>>>>>>>>>>> Hellström wrote:
>>>>>>>>>>>>>>>>>>>>> Hi, Danilo,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 9/9/23 17:31, Danilo Krummrich wrote:
>>>>>>>>>>>>>>>>>>>>>> So far the DRM GPUVA manager offers common 
>>>>>>>>>>>>>>>>>>>>>> infrastructure to
>>>>>>>>>>>>>>>>>>>>>> track GPU VA
>>>>>>>>>>>>>>>>>>>>>> allocations and mappings, generically connect GPU 
>>>>>>>>>>>>>>>>>>>>>> VA mappings
>>>>>>>>>>>>>>>>>>>>>> to their
>>>>>>>>>>>>>>>>>>>>>> backing buffers and perform more complex mapping 
>>>>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>>>> on the GPU VA
>>>>>>>>>>>>>>>>>>>>>> space.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> However, there are more design patterns commonly 
>>>>>>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>>>>>>>> drivers, which
>>>>>>>>>>>>>>>>>>>>>> can potentially be generalized in order to make 
>>>>>>>>>>>>>>>>>>>>>> the DRM GPUVA
>>>>>>>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>>>>>>> represent a basic GPU-VM implementation. In this 
>>>>>>>>>>>>>>>>>>>>>> context,
>>>>>>>>>>>>>>>>>>>>>> this patch aims
>>>>>>>>>>>>>>>>>>>>>> at generalizing the following elements.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1) Provide a common dma-resv for GEM objects not 
>>>>>>>>>>>>>>>>>>>>>> being used
>>>>>>>>>>>>>>>>>>>>>> outside of
>>>>>>>>>>>>>>>>>>>>>>       this GPU-VM.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2) Provide tracking of external GEM objects (GEM 
>>>>>>>>>>>>>>>>>>>>>> objects
>>>>>>>>>>>>>>>>>>>>>> which are
>>>>>>>>>>>>>>>>>>>>>>       shared with other GPU-VMs).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 3) Provide functions to efficiently lock all GEM 
>>>>>>>>>>>>>>>>>>>>>> objects dma-
>>>>>>>>>>>>>>>>>>>>>> resv the
>>>>>>>>>>>>>>>>>>>>>>       GPU-VM contains mappings of.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 4) Provide tracking of evicted GEM objects the 
>>>>>>>>>>>>>>>>>>>>>> GPU-VM
>>>>>>>>>>>>>>>>>>>>>> contains mappings
>>>>>>>>>>>>>>>>>>>>>>       of, such that validation of evicted GEM 
>>>>>>>>>>>>>>>>>>>>>> objects is
>>>>>>>>>>>>>>>>>>>>>> accelerated.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 5) Provide some convinience functions for common 
>>>>>>>>>>>>>>>>>>>>>> patterns.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Rather than being designed as a "framework", the 
>>>>>>>>>>>>>>>>>>>>>> target is to
>>>>>>>>>>>>>>>>>>>>>> make all
>>>>>>>>>>>>>>>>>>>>>> features appear as a collection of optional 
>>>>>>>>>>>>>>>>>>>>>> helper functions,
>>>>>>>>>>>>>>>>>>>>>> such that
>>>>>>>>>>>>>>>>>>>>>> drivers are free to make use of the DRM GPUVA 
>>>>>>>>>>>>>>>>>>>>>> managers basic
>>>>>>>>>>>>>>>>>>>>>> functionality and opt-in for other features 
>>>>>>>>>>>>>>>>>>>>>> without setting
>>>>>>>>>>>>>>>>>>>>>> any feature
>>>>>>>>>>>>>>>>>>>>>> flags, just by making use of the corresponding 
>>>>>>>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Big kudos to Boris Brezillon for his help to 
>>>>>>>>>>>>>>>>>>>>>> figure out
>>>>>>>>>>>>>>>>>>>>>> locking for drivers
>>>>>>>>>>>>>>>>>>>>>> updating the GPU VA space within the fence 
>>>>>>>>>>>>>>>>>>>>>> signalling path.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Suggested-by: Matthew Brost 
>>>>>>>>>>>>>>>>>>>>>> <matthew.brost@intel.com>
>>>>>>>>>>>>>>>>>>>>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
>>>>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/drm_gpuvm.c | 516
>>>>>>>>>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>>>>>>>>>> include/drm/drm_gpuvm.h | 197 ++++++++++++++
>>>>>>>>>>>>>>>>>>>>>>     2 files changed, 713 insertions(+)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>>>> index f4411047dbb3..8e62a043f719 100644
>>>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>>>>>>>>>>>>>>>>>>>>>> @@ -73,6 +73,21 @@
>>>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object list of &drm_gpuvm_bos for 
>>>>>>>>>>>>>>>>>>>>>> an existing
>>>>>>>>>>>>>>>>>>>>>> instance of this
>>>>>>>>>>>>>>>>>>>>>>      * particular combination. If not existent a 
>>>>>>>>>>>>>>>>>>>>>> new instance
>>>>>>>>>>>>>>>>>>>>>> is created and linked
>>>>>>>>>>>>>>>>>>>>>>      * to the &drm_gem_object.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm_bo structures, since unique for a 
>>>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm, are also used
>>>>>>>>>>>>>>>>>>>>>> + * as entry for the &drm_gpuvm's lists of 
>>>>>>>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>>>>>>>> evicted objects. Those
>>>>>>>>>>>>>>>>>>>>>> + * list are maintained in order to accelerate 
>>>>>>>>>>>>>>>>>>>>>> locking of
>>>>>>>>>>>>>>>>>>>>>> dma-resv locks and
>>>>>>>>>>>>>>>>>>>>>> + * validation of evicted objects bound in a 
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm. For
>>>>>>>>>>>>>>>>>>>>>> instance the all
>>>>>>>>>>>>>>>>>>>>>> + * &drm_gem_object's &dma_resv of a given 
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm can be
>>>>>>>>>>>>>>>>>>>>>> locked by calling
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock(). Once locked drivers 
>>>>>>>>>>>>>>>>>>>>>> can call
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() in
>>>>>>>>>>>>>>>>>>>>>> + * order to validate all evicted 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects. It is
>>>>>>>>>>>>>>>>>>>>>> also possible to lock
>>>>>>>>>>>>>>>>>>>>>> + * additional &drm_gem_objects by providing the
>>>>>>>>>>>>>>>>>>>>>> corresponding parameters to
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() as well as open code 
>>>>>>>>>>>>>>>>>>>>>> the &drm_exec
>>>>>>>>>>>>>>>>>>>>>> loop while making
>>>>>>>>>>>>>>>>>>>>>> + * use of helper functions such as 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects().
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Every bound &drm_gem_object is treated as 
>>>>>>>>>>>>>>>>>>>>>> external object
>>>>>>>>>>>>>>>>>>>>>> when its &dma_resv
>>>>>>>>>>>>>>>>>>>>>> + * structure is different than the &drm_gpuvm's 
>>>>>>>>>>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>>>>>>>> &dma_resv structure.
>>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>>> @@ -420,6 +435,20 @@
>>>>>>>>>>>>>>>>>>>>>>      * Subsequent calls to drm_gpuvm_bo_obtain() 
>>>>>>>>>>>>>>>>>>>>>> for the same
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object must be able to observe 
>>>>>>>>>>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>>>>>>> creations and destructions
>>>>>>>>>>>>>>>>>>>>>>      * of &drm_gpuvm_bos in order to keep 
>>>>>>>>>>>>>>>>>>>>>> instances unique.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * The &drm_gpuvm's lists for keeping track of 
>>>>>>>>>>>>>>>>>>>>>> external and
>>>>>>>>>>>>>>>>>>>>>> evicted objects are
>>>>>>>>>>>>>>>>>>>>>> + * protected against concurrent insertion / 
>>>>>>>>>>>>>>>>>>>>>> removal and
>>>>>>>>>>>>>>>>>>>>>> iteration internally.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * However, drivers still need ensure to protect 
>>>>>>>>>>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>>>>>>>>>>> calls to functions
>>>>>>>>>>>>>>>>>>>>>> + * iterating those lists, such as 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate() and
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects(). Every such 
>>>>>>>>>>>>>>>>>>>>>> function contains
>>>>>>>>>>>>>>>>>>>>>> a particular
>>>>>>>>>>>>>>>>>>>>>> + * comment and lockdep checks if possible.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Functions adding or removing entries from 
>>>>>>>>>>>>>>>>>>>>>> those lists,
>>>>>>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() or 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_extobj_add() may be
>>>>>>>>>>>>>>>>>>>>>> called with external
>>>>>>>>>>>>>>>>>>>>>> + * locks being held, e.g. in order to avoid the
>>>>>>>>>>>>>>>>>>>>>> corresponding list to be
>>>>>>>>>>>>>>>>>>>>>> + * (safely) modified while potentially being 
>>>>>>>>>>>>>>>>>>>>>> iternated by
>>>>>>>>>>>>>>>>>>>>>> other API functions.
>>>>>>>>>>>>>>>>>>>>>> + * However, this is entirely optional.
>>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>>> @@ -632,6 +661,131 @@
>>>>>>>>>>>>>>>>>>>>>>      *   }
>>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * get_next_vm_bo_from_list() - get the next 
>>>>>>>>>>>>>>>>>>>>>> vm_bo element
>>>>>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list 
>>>>>>>>>>>>>>>>>>>>>> used to store
>>>>>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>>>>>> + * @__prev_vm_bo: The previous element we got from
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_get_next_cached_vm_bo()
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>>>>>> first element from
>>>>>>>>>>>>>>>>>>>>>> + * the list, so list insertion deletion can happen
>>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>> Are the list spinlocks needed for that async state 
>>>>>>>>>>>>>>>>>>>>> update from
>>>>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>>>>> dma-fence critical section we've discussed 
>>>>>>>>>>>>>>>>>>>>> previously?
>>>>>>>>>>>>>>>>>>>> Yes, but also for other reasons, see below.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Otherwise it should be sufficient to protect the 
>>>>>>>>>>>>>>>>>>>>> lists with the
>>>>>>>>>>>>>>>>>>>>> gpuvm's resv
>>>>>>>>>>>>>>>>>>>>> (or for the extobj list with an outer lock).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If those spinlocks are still needed in some 
>>>>>>>>>>>>>>>>>>>>> situations, perhaps
>>>>>>>>>>>>>>>>>>>>> could we
>>>>>>>>>>>>>>>>>>>>> have an option to set them to NULL (Like IIRC the 
>>>>>>>>>>>>>>>>>>>>> maple tree
>>>>>>>>>>>>>>>>>>>>> allows for)?
>>>>>>>>>>>>>>>>>>>> The evict spinlock is needed in any case, since in
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_evict() we're
>>>>>>>>>>>>>>>>>>>> holding only the dma-resv lock from the BO this 
>>>>>>>>>>>>>>>>>>>> function gets
>>>>>>>>>>>>>>>>>>>> called for. Hence,
>>>>>>>>>>>>>>>>>>>> the spinlock protects concurrent 
>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_evict() calls with
>>>>>>>>>>>>>>>>>>>> different BOs.
>>>>>>>>>>>>>>>>>>> No. Only if you try to add external objects to the 
>>>>>>>>>>>>>>>>>>> vm's evict list
>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> within the evict code. That's not necessary since 
>>>>>>>>>>>>>>>>>>> you loop through
>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>> external objects anyway when locking them so an 
>>>>>>>>>>>>>>>>>>> "evicted" bool in
>>>>>>>>>>>>>>>>>>> the vm_bo,
>>>>>>>>>>>>>>>>>>> protected by the bo resv would be sufficient. The 
>>>>>>>>>>>>>>>>>>> extobj locking
>>>>>>>>>>>>>>>>>>> loop can
>>>>>>>>>>>>>>>>>>> then add the bo to the evicted list.
>>>>>>>>>>>>>>>>>> And validate() can remove it while still holding all 
>>>>>>>>>>>>>>>>>> dma-resv locks,
>>>>>>>>>>>>>>>>>> neat!
>>>>>>>>>>>>>>>>>> However, what if two tasks are trying to lock the VA 
>>>>>>>>>>>>>>>>>> space
>>>>>>>>>>>>>>>>>> concurrently? What
>>>>>>>>>>>>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to 
>>>>>>>>>>>>>>>>>> zero in
>>>>>>>>>>>>>>>>>> drm_gpuva_unlink()?
>>>>>>>>>>>>>>>>>> Are we guaranteed that at this point of time the 
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo is not
>>>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>>> evicted list? Because otherwise we would call 
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>>>> dma-resv lock held, which wouldn't be allowed, since
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy()
>>>>>>>>>>>>>>>>>> might drop the last reference to the drm_gem_object 
>>>>>>>>>>>>>>>>>> and hence we'd
>>>>>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>>>>>> free the dma-resv lock while holding it, at least if 
>>>>>>>>>>>>>>>>>> it's an external
>>>>>>>>>>>>>>>>>> object.
>>>>>>>>>>>>>>>>> Easiest way in this scheme is to think of the lists as 
>>>>>>>>>>>>>>>>> being protected
>>>>>>>>>>>>>>>>> by the vm's resv lock. That means anybody calling 
>>>>>>>>>>>>>>>>> unlink() must also
>>>>>>>>>>>>>>>>> hold the vm's resv lock. (Which is OK from an UAF 
>>>>>>>>>>>>>>>>> point of view, but
>>>>>>>>>>>>>>>>> perhaps not from a locking inversion POW from an async 
>>>>>>>>>>>>>>>>> list update).
>>>>>>>>>>>>>>>> This would mean that on unlink() we'd need to hold the 
>>>>>>>>>>>>>>>> VM's resv lock and the
>>>>>>>>>>>>>>>> corresponding GEM's resv lock (in case they're not the 
>>>>>>>>>>>>>>>> same anyways) because the
>>>>>>>>>>>>>>>> VM's resv lock would protect the external / evicted 
>>>>>>>>>>>>>>>> object lists and the GEM
>>>>>>>>>>>>>>>> objects resv lock protects the GEM's list of 
>>>>>>>>>>>>>>>> drm_gpuvm_bos and the
>>>>>>>>>>>>>>>> drm_gpuvm_bo's list of drm_gpuvas.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For extobjs an outer lock would be enough in case 
>>>>>>>>>>>>>>>>>>>> of Xe, but I
>>>>>>>>>>>>>>>>>>>> really would not
>>>>>>>>>>>>>>>>>>>> like to add even more complexity just to get the 
>>>>>>>>>>>>>>>>>>>> spinlock out of
>>>>>>>>>>>>>>>>>>>> the way in case
>>>>>>>>>>>>>>>>>>>> the driver already has an outer lock protecting 
>>>>>>>>>>>>>>>>>>>> this path.
>>>>>>>>>>>>>>>>>>> I must disagree here. These spinlocks and atomic 
>>>>>>>>>>>>>>>>>>> operations are
>>>>>>>>>>>>>>>>>>> pretty
>>>>>>>>>>>>>>>>>>> costly and as discussed earlier this type of locking 
>>>>>>>>>>>>>>>>>>> was the reason
>>>>>>>>>>>>>>>>>>> (at
>>>>>>>>>>>>>>>>>>> least according to the commit message) that made 
>>>>>>>>>>>>>>>>>>> Christian drop the
>>>>>>>>>>>>>>>>>>> XArray
>>>>>>>>>>>>>>>>>>> use in drm_exec for the same set of objects: "The 
>>>>>>>>>>>>>>>>>>> locking overhead
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> unecessary and measurable". IMHO the spinlock is the 
>>>>>>>>>>>>>>>>>>> added
>>>>>>>>>>>>>>>>>>> complexity and a
>>>>>>>>>>>>>>>>>>> single wide lock following the drm locking 
>>>>>>>>>>>>>>>>>>> guidelines set out by
>>>>>>>>>>>>>>>>>>> Daniel and
>>>>>>>>>>>>>>>>>>> David should really be the default choice with an 
>>>>>>>>>>>>>>>>>>> opt-in for a
>>>>>>>>>>>>>>>>>>> spinlock if
>>>>>>>>>>>>>>>>>>> needed for async and pushing out to a wq is not an 
>>>>>>>>>>>>>>>>>>> option.
>>>>>>>>>>>>>>>>>> For the external object list an outer lock would work 
>>>>>>>>>>>>>>>>>> as long as it's
>>>>>>>>>>>>>>>>>> not the
>>>>>>>>>>>>>>>>>> dma-resv lock of the corresponding GEM object, since 
>>>>>>>>>>>>>>>>>> here we actually
>>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>> remove the list entry from the external object list on
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy().
>>>>>>>>>>>>>>>>>> It's just a bit weird design wise that drivers would 
>>>>>>>>>>>>>>>>>> need to take
>>>>>>>>>>>>>>>>>> this outer
>>>>>>>>>>>>>>>>>> lock on:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - drm_gpuvm_bo_extobj_add()
>>>>>>>>>>>>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also 
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>>>>>> - drm_gpuva_unlink()            (because it needs to 
>>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_put())
>>>>>>>>>>>>>>>>>> - drm_gpuvm_exec_lock()
>>>>>>>>>>>>>>>>>> - drm_gpuvm_exec_lock_array()
>>>>>>>>>>>>>>>>>> - drm_gpuvm_prepare_range()
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Given that it seems reasonable to do all the required 
>>>>>>>>>>>>>>>>>> locking
>>>>>>>>>>>>>>>>>> internally.
>>>>>>>>>>>>>>>>>  From a design POW, there has been a clear direction 
>>>>>>>>>>>>>>>>> in XE to make
>>>>>>>>>>>>>>>>> things similar to mmap() / munmap(), so this outer 
>>>>>>>>>>>>>>>>> lock, which in Xe is
>>>>>>>>>>>>>>>>> an rwsem, is used in a similar way as the mmap_lock. 
>>>>>>>>>>>>>>>>> It's protecting
>>>>>>>>>>>>>>>>> the page-table structures and vma rb tree, the userptr 
>>>>>>>>>>>>>>>>> structures and
>>>>>>>>>>>>>>>>> the extobj list. Basically it's taken early in the 
>>>>>>>>>>>>>>>>> exec IOCTL, the
>>>>>>>>>>>>>>>>> VM_BIND ioctl, the compute rebind worker and the 
>>>>>>>>>>>>>>>>> pagefault handler, so
>>>>>>>>>>>>>>>>> all of the above are just asserting that it is taken 
>>>>>>>>>>>>>>>>> in the correct
>>>>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But strictly with this scheme one could also use the 
>>>>>>>>>>>>>>>>> vm's dma_resv for
>>>>>>>>>>>>>>>>> the extobj list since with drm_exec, it's locked 
>>>>>>>>>>>>>>>>> before traversing the
>>>>>>>>>>>>>>>>> list.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The whole point of this scheme is to rely on locks 
>>>>>>>>>>>>>>>>> that you already are
>>>>>>>>>>>>>>>>> supposed to be holding for various reasons and is 
>>>>>>>>>>>>>>>>> simple to comprehend.
>>>>>>>>>>>>>>>> I don't agree that we're supposed to hold the VM's resv 
>>>>>>>>>>>>>>>> lock anyways for
>>>>>>>>>>>>>>>> functions like drm_gpuvm_bo_put() or 
>>>>>>>>>>>>>>>> drm_gpuva_unlink(), but I'm fine using it
>>>>>>>>>>>>>>>> for that purpose nevertheless.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In order to at least place lockdep checks, the driver 
>>>>>>>>>>>>>>>>>> would need to
>>>>>>>>>>>>>>>>>> supply the
>>>>>>>>>>>>>>>>>> corresponding lock's lockdep_map, because the GPUVM 
>>>>>>>>>>>>>>>>>> otherwise doesn't
>>>>>>>>>>>>>>>>>> know about
>>>>>>>>>>>>>>>>>> the lock.
>>>>>>>>>>>>>>>>> Yes, that sounds reasonable. One lockdep map per list.
>>>>>>>>>>>>>>>> I'd really like to avoid that, especially now that 
>>>>>>>>>>>>>>>> everything got simpler. We
>>>>>>>>>>>>>>>> should define the actual locks to take instead.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Out of curiosity, what is the overhead of a 
>>>>>>>>>>>>>>>>>> spin_lock() that doesn't
>>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>> spin?
>>>>>>>>>>>>>>>>> I guess it's hard to tell exactly, but it is much 
>>>>>>>>>>>>>>>>> lower on modern x86
>>>>>>>>>>>>>>>>> than what it used to be. Not sure about ARM, which is 
>>>>>>>>>>>>>>>>> the other
>>>>>>>>>>>>>>>>> architecture important to us. I figure if there is 
>>>>>>>>>>>>>>>>> little cache-line
>>>>>>>>>>>>>>>>> bouncing the main overhead comes from the implied 
>>>>>>>>>>>>>>>>> barriers.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> A pretty simple way that would not add much code 
>>>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> static void gpuvm_cond_spin_lock(const struct 
>>>>>>>>>>>>>>>>>>> drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>> spinlock_t
>>>>>>>>>>>>>>>>>>> *lock)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>      if (!gpuvm->resv_protected_lists)
>>>>>>>>>>>>>>>>>>>          spin_lock(lock);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> For such drivers, that would require anybody 
>>>>>>>>>>>>>>>>>>>>> calling unlink to
>>>>>>>>>>>>>>>>>>>>> hold the vm's
>>>>>>>>>>>>>>>>>>>>> resv, though.
>>>>>>>>>>>>>>>>>>>> In V4 I want to go back to having a dedicated lock 
>>>>>>>>>>>>>>>>>>>> for the GEMs
>>>>>>>>>>>>>>>>>>>> gpuva list (or
>>>>>>>>>>>>>>>>>>>> VM_BO list to be more precise). We can't just use 
>>>>>>>>>>>>>>>>>>>> the dma-resv
>>>>>>>>>>>>>>>>>>>> lock for that
>>>>>>>>>>>>>>>>>>>> with VM_BO abstractions, because on destruction of 
>>>>>>>>>>>>>>>>>>>> a VM_BO we
>>>>>>>>>>>>>>>>>>>> otherwise wouldn't
>>>>>>>>>>>>>>>>>>>> be allowed to already hold the dma-resv lock. 
>>>>>>>>>>>>>>>>>>>> That's the fix I
>>>>>>>>>>>>>>>>>>>> was referring to
>>>>>>>>>>>>>>>>>>>> earlier.
>>>>>>>>>>>>>>>>>>> Yeah, I can see the need for a dedicated lock for 
>>>>>>>>>>>>>>>>>>> the GEM's gpuva
>>>>>>>>>>>>>>>>>>> list, but
>>>>>>>>>>>>>>>>>>> holding the vm's dma-resv lock across the unlink 
>>>>>>>>>>>>>>>>>>> shouldn't be a
>>>>>>>>>>>>>>>>>>> problem. We
>>>>>>>>>>>>>>>>>>> may free the object and a pointer to the vm's resv 
>>>>>>>>>>>>>>>>>>> during unlink
>>>>>>>>>>>>>>>>>>> but we
>>>>>>>>>>>>>>>>>>> don't free the vm's resv. It'd be a matter of 
>>>>>>>>>>>>>>>>>>> ensuring that any
>>>>>>>>>>>>>>>>>>> calls to
>>>>>>>>>>>>>>>>>>> unlink from *within* drm_gpuvm allows it to be held.
>>>>>>>>>>>>>>>>>> Drivers calling unlink() from the fence signaling 
>>>>>>>>>>>>>>>>>> path can't use the
>>>>>>>>>>>>>>>>>> VM's
>>>>>>>>>>>>>>>>>> dma-resv lock.
>>>>>>>>>>>>>>>>> Yes, that made me a bit curious because in the current 
>>>>>>>>>>>>>>>>> version the code
>>>>>>>>>>>>>>>>> required the object's dma_resv for unlink() which 
>>>>>>>>>>>>>>>>> can't be grabbed
>>>>>>>>>>>>>>>>> either from the fence signaling path. So are there any 
>>>>>>>>>>>>>>>>> drivers actually
>>>>>>>>>>>>>>>>> wanting to do that? If so, they will either need to 
>>>>>>>>>>>>>>>>> resort to the
>>>>>>>>>>>>>>>>> current spinlock solution or they will need to call 
>>>>>>>>>>>>>>>>> unlink from a
>>>>>>>>>>>>>>>>> workqueue item.
>>>>>>>>>>>>>>>> As Boris already mentioned we have the dma-resv lock by 
>>>>>>>>>>>>>>>> default or a driver
>>>>>>>>>>>>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid 
>>>>>>>>>>>>>>>> of the latter.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Also, what if the object is an external object? We 
>>>>>>>>>>>>>>>>>> can't use the VM's
>>>>>>>>>>>>>>>>>> dma-resv
>>>>>>>>>>>>>>>>>> lock here.
>>>>>>>>>>>>>>>>> Why? Typically (sync) unlink is only ever called from 
>>>>>>>>>>>>>>>>> an unbind-like
>>>>>>>>>>>>>>>>> operation where it should be trivial to grab the vm's 
>>>>>>>>>>>>>>>>> resv. Or, for
>>>>>>>>>>>>>>>>> that matter any outer lock protecting the extobj list. 
>>>>>>>>>>>>>>>>> Rule would be
>>>>>>>>>>>>>>>>> the drm_gpuvm_bo::entry::extobj and 
>>>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::evict would
>>>>>>>>>>>>>>>>> be protected by either the vm's dma_resv (or possibly 
>>>>>>>>>>>>>>>>> an outer lock in
>>>>>>>>>>>>>>>>> the case of the extobj list).
>>>>>>>>>>>>>>>> Outer lock wouldn't have been working for updates in 
>>>>>>>>>>>>>>>> the async path, but
>>>>>>>>>>>>>>>> shouldn't be relevant anymore. We could use the VM's 
>>>>>>>>>>>>>>>> resv for that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   And we can't have the GEM objs dma-resv lock held 
>>>>>>>>>>>>>>>>>> when calling
>>>>>>>>>>>>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), 
>>>>>>>>>>>>>>>>>> which if the
>>>>>>>>>>>>>>>>>> refcount drops
>>>>>>>>>>>>>>>>>> to zero calls drm_gpuvm_bo_destroy() and 
>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_destroy() might
>>>>>>>>>>>>>>>>>> drop the
>>>>>>>>>>>>>>>>>> last reference of the GEM object.
>>>>>>>>>>>>>>>>> Yes, but this is a different problem as to what 
>>>>>>>>>>>>>>>>> exactly protects
>>>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an 
>>>>>>>>>>>>>>>>> internal per bo list
>>>>>>>>>>>>>>>>> lock, or if we want to keep the bo's dma_resv we need 
>>>>>>>>>>>>>>>>> to ensure that
>>>>>>>>>>>>>>>>> the caller of dma_resv_unlock(obj->resv) actually 
>>>>>>>>>>>>>>>>> refcounts its obj
>>>>>>>>>>>>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's 
>>>>>>>>>>>>>>>>> refcount (I know
>>>>>>>>>>>>>>>>> Boris didn't like that, but requiring an explicit 
>>>>>>>>>>>>>>>>> refcount for a
>>>>>>>>>>>>>>>>> pointer you dereference unless you're under a lock 
>>>>>>>>>>>>>>>>> that ensures keeping
>>>>>>>>>>>>>>>>> the object alive is pretty much required?) But anyway 
>>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or 
>>>>>>>>>>>>>>>>> internal spinlock)
>>>>>>>>>>>>>>>>> I don't have a strong preference.
>>>>>>>>>>>>>>>> We can keep the GEM objects dma-resv lock, however as 
>>>>>>>>>>>>>>>> mentioned above
>>>>>>>>>>>>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires 
>>>>>>>>>>>>>>>> both the VM's resv lock
>>>>>>>>>>>>>>>> and the GEM's resv lock in case they differ.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   All those problems go away with a dedicated
>>>>>>>>>>>>>>>>>> GEM gpuva list lock.
>>>>>>>>>>>>>>>>> I don't think these are real problems.
>>>>>>>>>>>>>>>>> With the excepton of the eviction list "trick" where 
>>>>>>>>>>>>>>>>> we currently have
>>>>>>>>>>>>>>>>> slightly different approach to collect external bos 
>>>>>>>>>>>>>>>>> needing rebinding,
>>>>>>>>>>>>>>>>> we have this working fine.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> TBH I think pretty much the only situation where the 
>>>>>>>>>>>>>>>>> spinlock is needed
>>>>>>>>>>>>>>>>> is for async updates of these lists, unless a wq item 
>>>>>>>>>>>>>>>>> can be used for
>>>>>>>>>>>>>>>>> that, but it doesn't really seem like the current code 
>>>>>>>>>>>>>>>>> allows for such
>>>>>>>>>>>>>>>>> updates anyway? It complicates the code a lot, adds 
>>>>>>>>>>>>>>>>> overhead and also
>>>>>>>>>>>>>>>>> adds the requirement for refcounting during list 
>>>>>>>>>>>>>>>>> traversal.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It seems that with that also the refcount could be 
>>>>>>>>>>>>>>>>>>>>> make non-
>>>>>>>>>>>>>>>>>>>>> atomic.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> All in the spirit of the drm locking guidelines 
>>>>>>>>>>>>>>>>>>>>> "use big locks
>>>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>> possible".
>>>>>>>>>>>>>>>>>>>>> Lower level locks only when necessary for 
>>>>>>>>>>>>>>>>>>>>> performance or
>>>>>>>>>>>>>>>>>>>>> locking inversion?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> /Thomas
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Elements popped from the original list are 
>>>>>>>>>>>>>>>>>>>>>> kept in a
>>>>>>>>>>>>>>>>>>>>>> local list, so removal
>>>>>>>>>>>>>>>>>>>>>> + * and is_empty checks can still happen while we're
>>>>>>>>>>>>>>>>>>>>>> iterating the list.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +#define get_next_vm_bo_from_list(__gpuvm, 
>>>>>>>>>>>>>>>>>>>>>> __list_name,
>>>>>>>>>>>>>>>>>>>>>> __local_list, __prev_vm_bo)     \
>>>>>>>>>>>>>>>>>>>>>> +       ({
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +               struct drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>>> *__vm_bo;                                           \ 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_put(__prev_vm_bo);
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>>>>>> +               while (!list_empty(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.list)) {                     \
>>>>>>>>>>>>>>>>>>>>>> +                       __vm_bo =
>>>>>>>>>>>>>>>>>>>>>> list_first_entry(&(__gpuvm)->__list_name.list,        \ 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo,                 \
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);             \
>>>>>>>>>>>>>>>>>>>>>> +                       if
>>>>>>>>>>>>>>>>>>>>>> (drm_gpuvm_bo_get_unless_zero(__vm_bo))
>>>>>>>>>>>>>>>>>>>>>> {                    \
>>>>>>>>>>>>>>>>>>>>>> +                               list_move_tail(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name,      \
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> __local_list);                           \
>>>>>>>>>>>>>>>>>>>>>> +                               break;
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +                       } else
>>>>>>>>>>>>>>>>>>>>>> {                                                        \ 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> +                               list_del_init(&(__vm_bo)- 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>>>>>> +                               __vm_bo =
>>>>>>>>>>>>>>>>>>>>>> NULL;                                         \
>>>>>>>>>>>>>>>>>>>>>> +                       }
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +               __vm_bo;
>>>>>>>>>>>>>>>>>>>>>>                             \
>>>>>>>>>>>>>>>>>>>>>> +       })
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * for_each_vm_bo_in_list() - internal vm_bo 
>>>>>>>>>>>>>>>>>>>>>> list iterator
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * This helper is here to provide lockless list 
>>>>>>>>>>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>>>>>>>>>> Lockless as in, the
>>>>>>>>>>>>>>>>>>>>>> + * iterator releases the lock immediately after 
>>>>>>>>>>>>>>>>>>>>>> picking the
>>>>>>>>>>>>>>>>>>>>>> first element from the
>>>>>>>>>>>>>>>>>>>>>> + * list, so list insertion and deletion can happen
>>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Typical use:
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + *     struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>>>> + *     LIST_HEAD(my_local_list);
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + *     ret = 0;
>>>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>,
>>>>>>>>>>>>>>>>>>>>>> &my_local_list, vm_bo) {
>>>>>>>>>>>>>>>>>>>>>> + *             ret = 
>>>>>>>>>>>>>>>>>>>>>> do_something_with_vm_bo(..., vm_bo);
>>>>>>>>>>>>>>>>>>>>>> + *             if (ret)
>>>>>>>>>>>>>>>>>>>>>> + *                     break;
>>>>>>>>>>>>>>>>>>>>>> + *     }
>>>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>>>> + *     drm_gpuvm_restore_vm_bo_list(gpuvm, 
>>>>>>>>>>>>>>>>>>>>>> <list_name>,
>>>>>>>>>>>>>>>>>>>>>> &my_local_list);
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Only used for internal list iterations, not 
>>>>>>>>>>>>>>>>>>>>>> meant to be
>>>>>>>>>>>>>>>>>>>>>> exposed to the outside
>>>>>>>>>>>>>>>>>>>>>> + * world.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +#define for_each_vm_bo_in_list(__gpuvm, 
>>>>>>>>>>>>>>>>>>>>>> __list_name,
>>>>>>>>>>>>>>>>>>>>>> __local_list, __vm_bo)    \
>>>>>>>>>>>>>>>>>>>>>> +       for (__vm_bo = 
>>>>>>>>>>>>>>>>>>>>>> get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> NULL);            \
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> __vm_bo;
>>>>>>>>>>>>>>>>>>>>>>        \
>>>>>>>>>>>>>>>>>>>>>> +            __vm_bo = 
>>>>>>>>>>>>>>>>>>>>>> get_next_vm_bo_from_list(__gpuvm,
>>>>>>>>>>>>>>>>>>>>>> __list_name,           \
>>>>>>>>>>>>>>>>>>>>>> +                                               __local_list, 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> __vm_bo))         \
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * restore_vm_bo_list() - move vm_bo elements 
>>>>>>>>>>>>>>>>>>>>>> back to their
>>>>>>>>>>>>>>>>>>>>>> original list
>>>>>>>>>>>>>>>>>>>>>> + * @__gpuvm: The GPU VM
>>>>>>>>>>>>>>>>>>>>>> + * @__list_name: The name of the list we're 
>>>>>>>>>>>>>>>>>>>>>> iterating on
>>>>>>>>>>>>>>>>>>>>>> + * @__local_list: A pointer to the local list 
>>>>>>>>>>>>>>>>>>>>>> used to store
>>>>>>>>>>>>>>>>>>>>>> already iterated items
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * When we're done iterating a vm_bo list, we 
>>>>>>>>>>>>>>>>>>>>>> should call
>>>>>>>>>>>>>>>>>>>>>> restore_vm_bo_list()
>>>>>>>>>>>>>>>>>>>>>> + * to restore the original state and let new 
>>>>>>>>>>>>>>>>>>>>>> iterations take
>>>>>>>>>>>>>>>>>>>>>> place.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +#define restore_vm_bo_list(__gpuvm, __list_name,
>>>>>>>>>>>>>>>>>>>>>> __local_list)                         \
>>>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>                  \
>>>>>>>>>>>>>>>>>>>>>> +               /* Merge back the two lists, 
>>>>>>>>>>>>>>>>>>>>>> moving local
>>>>>>>>>>>>>>>>>>>>>> list elements to the          \
>>>>>>>>>>>>>>>>>>>>>> +                * head to preserve previous 
>>>>>>>>>>>>>>>>>>>>>> ordering, in
>>>>>>>>>>>>>>>>>>>>>> case it matters.              \
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>            \
>>>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                                \
>>>>>>>>>>>>>>>>>>>>>> +               list_splice(__local_list, 
>>>>>>>>>>>>>>>>>>>>>> &(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.list);                \
>>>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__gpuvm)-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                              \
>>>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_add() - insert a vm_bo into 
>>>>>>>>>>>>>>>>>>>>>> the given
>>>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert 
>>>>>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Inserts the given @__vm_bo into the list 
>>>>>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>>>>>> + * increases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_add(__vm_bo,
>>>>>>>>>>>>>>>>>>>>>> __list_name)      ��                     \
>>>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>>>>>> +               if (list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name))             \
>>>>>>>>>>>>>>>>>>>>>> +                       list_add_tail(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name,       \
>>>>>>>>>>>>>>>>>>>>>> + &(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.list);        \
>>>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_list_del() - remove a vm_bo from 
>>>>>>>>>>>>>>>>>>>>>> the given
>>>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>>>> + * @__vm_bo: the &drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>>> + * @__list_name: the name of the list to insert 
>>>>>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Removes the given @__vm_bo from the list 
>>>>>>>>>>>>>>>>>>>>>> specified by
>>>>>>>>>>>>>>>>>>>>>> @__list_name and
>>>>>>>>>>>>>>>>>>>>>> + * decreases the vm_bo's reference count.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +#define drm_gpuvm_bo_list_del(__vm_bo,
>>>>>>>>>>>>>>>>>>>>>> __list_name)                            \
>>>>>>>>>>>>>>>>>>>>>> +       do
>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>          \
>>>>>>>>>>>>>>>>>>>>>> +               spin_lock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                    \
>>>>>>>>>>>>>>>>>>>>>> +               if (!list_empty(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name))            \
>>>>>>>>>>>>>>>>>>>>>> +                       list_del_init(&(__vm_bo)-
>>>>>>>>>>>>>>>>>>>>>>> list.entry.__list_name);      \
>>>>>>>>>>>>>>>>>>>>>> +               spin_unlock(&(__vm_bo)->vm-
>>>>>>>>>>>>>>>>>>>>>>> __list_name.lock);                  \
>>>>>>>>>>>>>>>>>>>>>> +       } while (0)
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>     #define 
>>>>>>>>>>>>>>>>>>>>>> to_drm_gpuva(__node) container_of((__node), struct
>>>>>>>>>>>>>>>>>>>>>> drm_gpuva, rb.node)
>>>>>>>>>>>>>>>>>>>>>>     #define GPUVA_START(node) ((node)->va.addr)
>>>>>>>>>>>>>>>>>>>>>> @@ -713,6 +867,12 @@ drm_gpuvm_init(struct 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> struct drm_device *drm,
>>>>>>>>>>>>>>>>>>>>>>          gpuvm->rb.tree = RB_ROOT_CACHED;
>>>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&gpuvm->rb.list);
>>>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->extobj.list);
>>>>>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&gpuvm->evict.list);
>>>>>>>>>>>>>>>>>>>>>> +       spin_lock_init(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>         ��drm_gpuva_check_overflow(start_offset, 
>>>>>>>>>>>>>>>>>>>>>> range);
>>>>>>>>>>>>>>>>>>>>>>          gpuvm->mm_start = start_offset;
>>>>>>>>>>>>>>>>>>>>>>          gpuvm->mm_range = range;
>>>>>>>>>>>>>>>>>>>>>> @@ -754,10 +914,302 @@ drm_gpuvm_destroy(struct 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> *gpuvm)
>>>>>>>>>>>>>>>>>>>>>>          WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root), 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>               "GPUVA tree is not empty, 
>>>>>>>>>>>>>>>>>>>>>> potentially leaking
>>>>>>>>>>>>>>>>>>>>>> memory.\n");
>>>>>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->extobj.list), 
>>>>>>>>>>>>>>>>>>>>>> "Extobj list
>>>>>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>>>>>> +       WARN(!list_empty(&gpuvm->evict.list), 
>>>>>>>>>>>>>>>>>>>>>> "Evict list
>>>>>>>>>>>>>>>>>>>>>> should be empty.\n");
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>          drm_gem_private_object_fini(&gpuvm->d_obj);
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_objects() - prepare all 
>>>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to 
>>>>>>>>>>>>>>>>>>>>>> reserve
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Note: This function is safe against 
>>>>>>>>>>>>>>>>>>>>>> concurrent insertion
>>>>>>>>>>>>>>>>>>>>>> and removal of
>>>>>>>>>>>>>>>>>>>>>> + * external objects, however it is not safe against
>>>>>>>>>>>>>>>>>>>>>> concurrent usage itself.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Drivers need to make sure to protect this 
>>>>>>>>>>>>>>>>>>>>>> case with
>>>>>>>>>>>>>>>>>>>>>> either an outer VM lock
>>>>>>>>>>>>>>>>>>>>>> + * or by calling drm_gpuvm_prepare_vm() before 
>>>>>>>>>>>>>>>>>>>>>> this function
>>>>>>>>>>>>>>>>>>>>>> within the
>>>>>>>>>>>>>>>>>>>>>> + * drm_exec_until_all_locked() loop, such that 
>>>>>>>>>>>>>>>>>>>>>> the GPUVM's
>>>>>>>>>>>>>>>>>>>>>> dma-resv lock ensures
>>>>>>>>>>>>>>>>>>>>>> + * mutual exclusion.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>>>> +       LIST_HEAD(extobjs);
>>>>>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, extobj, 
>>>>>>>>>>>>>>>>>>>>>> &extobjs,
>>>>>>>>>>>>>>>>>>>>>> vm_bo) {
>>>>>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>>>>> vm_bo->obj,
>>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the 
>>>>>>>>>>>>>>>>>>>>>> loop. */
>>>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, extobj, &extobjs);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>>>>>> mapped within
>>>>>>>>>>>>>>>>>>>>>> a given range
>>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to 
>>>>>>>>>>>>>>>>>>>>>> reserve
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for all 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>>>> mapped between @addr
>>>>>>>>>>>>>>>>>>>>>> + * and @addr + @range.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>>>>> drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>>> +                       u64 addr, u64 range, 
>>>>>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuva *va;
>>>>>>>>>>>>>>>>>>>>>> +       u64 end = addr + range;
>>>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_for_each_va_range(va, gpuvm, 
>>>>>>>>>>>>>>>>>>>>>> addr, end) {
>>>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object *obj = 
>>>>>>>>>>>>>>>>>>>>>> va->gem.obj;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +               ret = drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>>>>> obj,
>>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>>> +                       return ret;
>>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock() - lock all dma-resv of all
>>>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to 
>>>>>>>>>>>>>>>>>>>>>> reserve
>>>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvm contains mappings of.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Addionally, when calling this function with 
>>>>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_exec::extra
>>>>>>>>>>>>>>>>>>>>>> + * being set the driver receives the given @fn 
>>>>>>>>>>>>>>>>>>>>>> callback to
>>>>>>>>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>>>>>>>>> + * dma-resv in the context of the 
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_exec instance.
>>>>>>>>>>>>>>>>>>>>>> Typically, drivers
>>>>>>>>>>>>>>>>>>>>>> + * would call drm_exec_prepare_obj() from within 
>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>>> +                   bool interruptible)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>>>>>> +               ret = drm_gpuvm_prepare_vm(gpuvm, 
>>>>>>>>>>>>>>>>>>>>>> exec,
>>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +               ret = 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_objects(gpuvm, exec,
>>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +               if (vm_exec->extra.fn) {
>>>>>>>>>>>>>>>>>>>>>> +                       ret = 
>>>>>>>>>>>>>>>>>>>>>> vm_exec->extra.fn(vm_exec,
>>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>>> +                       drm_exec_retry_on_contention(exec); 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> +                       if (ret)
>>>>>>>>>>>>>>>>>>>>>> +                               goto err;
>>>>>>>>>>>>>>>>>>>>>> +               }
>>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +static int
>>>>>>>>>>>>>>>>>>>>>> +fn_lock_array(struct drm_gpuvm_exec *vm_exec, 
>>>>>>>>>>>>>>>>>>>>>> unsigned int
>>>>>>>>>>>>>>>>>>>>>> num_fences)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>>>>>> +       } *args = vm_exec->extra.priv;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       return 
>>>>>>>>>>>>>>>>>>>>>> drm_exec_prepare_array(&vm_exec->exec, args-
>>>>>>>>>>>>>>>>>>>>>>> objs,
>>>>>>>>>>>>>>>>>>>>>> + args->num_objs,
>>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_array() - lock all 
>>>>>>>>>>>>>>>>>>>>>> dma-resv of all
>>>>>>>>>>>>>>>>>>>>>> assoiciated BOs
>>>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>>>> + * @objs: additional &drm_gem_objects to lock
>>>>>>>>>>>>>>>>>>>>>> + * @num_objs: the number of additional 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects to
>>>>>>>>>>>>>>>>>>>>>> lock
>>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to 
>>>>>>>>>>>>>>>>>>>>>> reserve
>>>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects the
>>>>>>>>>>>>>>>>>>>>>> given &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> + * contains mappings of, plus the ones given 
>>>>>>>>>>>>>>>>>>>>>> through @objs.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>>> + bool interruptible)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>>> +               struct drm_gem_object **objs;
>>>>>>>>>>>>>>>>>>>>>> +               unsigned int num_objs;
>>>>>>>>>>>>>>>>>>>>>> +       } args;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       args.objs = objs;
>>>>>>>>>>>>>>>>>>>>>> +       args.num_objs = num_objs;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       vm_exec->extra.fn = fn_lock_array;
>>>>>>>>>>>>>>>>>>>>>> +       vm_exec->extra.priv = &args;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       return drm_gpuvm_exec_lock(vm_exec, 
>>>>>>>>>>>>>>>>>>>>>> num_fences,
>>>>>>>>>>>>>>>>>>>>>> interruptible);
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_lock_range() - prepare all BOs 
>>>>>>>>>>>>>>>>>>>>>> mapped
>>>>>>>>>>>>>>>>>>>>>> within a given range
>>>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>>>> + * @addr: the start address within the VA space
>>>>>>>>>>>>>>>>>>>>>> + * @range: the range to iterate within the VA space
>>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to 
>>>>>>>>>>>>>>>>>>>>>> reserve
>>>>>>>>>>>>>>>>>>>>>> + * @interruptible: sleep interruptible if waiting
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Acquires all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>>>> mapped between @addr and
>>>>>>>>>>>>>>>>>>>>>> + * @addr + @range.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>>> + bool interruptible)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_exec->vm;
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_exec *exec = &vm_exec->exec;
>>>>>>>>>>>>>>>>>>>>>> +       uint32_t flags;
>>>>>>>>>>>>>>>>>>>>>> +       int ret;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       flags = interruptible ? 
>>>>>>>>>>>>>>>>>>>>>> DRM_EXEC_INTERRUPTIBLE_WAIT :
>>>>>>>>>>>>>>>>>>>>>> 0 |
>>>>>>>>>>>>>>>>>>>>>> +               DRM_EXEC_IGNORE_DUPLICATES;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       drm_exec_init(exec, flags);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       drm_exec_until_all_locked(exec) {
>>>>>>>>>>>>>>>>>>>>>> +               ret = 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_prepare_range(gpuvm, exec,
>>>>>>>>>>>>>>>>>>>>>> addr, range,
>>>>>>>>>>>>>>>>>>>>>> + num_fences);
>>>>>>>>>>>>>>>>>>>>>> +               drm_exec_retry_on_contention(exec);
>>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>>> +                       goto err;
>>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +err:
>>>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(exec);
>>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_validate() - validate all BOs 
>>>>>>>>>>>>>>>>>>>>>> marked as evicted
>>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to validate evicted BOs
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Calls the &drm_gpuvm_ops.bo_validate callback 
>>>>>>>>>>>>>>>>>>>>>> for all
>>>>>>>>>>>>>>>>>>>>>> evicted buffer
>>>>>>>>>>>>>>>>>>>>>> + * objects being mapped in the given &drm_gpuvm.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       const struct drm_gpuvm_ops *ops = 
>>>>>>>>>>>>>>>>>>>>>> gpuvm->ops;
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>>>> +       LIST_HEAD(evict);
>>>>>>>>>>>>>>>>>>>>>> +       int ret = 0;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       if (unlikely(!ops || !ops->bo_validate))
>>>>>>>>>>>>>>>>>>>>>> +               return -ENOTSUPP;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       for_each_vm_bo_in_list(gpuvm, evict, 
>>>>>>>>>>>>>>>>>>>>>> &evict, vm_bo) {
>>>>>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(vm_bo->obj->resv); 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> +               ret = ops->bo_validate(vm_bo->obj);
>>>>>>>>>>>>>>>>>>>>>> +               if (ret)
>>>>>>>>>>>>>>>>>>>>>> +                       break;
>>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>>> +       /* Drop ref in case we break out of the 
>>>>>>>>>>>>>>>>>>>>>> loop. */
>>>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_bo_put(vm_bo);
>>>>>>>>>>>>>>>>>>>>>> +       restore_vm_bo_list(gpuvm, evict, &evict);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       return ret;
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_resv_add_fence - add fence to 
>>>>>>>>>>>>>>>>>>>>>> private and all
>>>>>>>>>>>>>>>>>>>>>> extobj
>>>>>>>>>>>>>>>>>>>>>> + * dma-resv
>>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to add a fence to
>>>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec locking context
>>>>>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage private_usage,
>>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage extobj_usage)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gem_object *obj;
>>>>>>>>>>>>>>>>>>>>>> +       unsigned long index;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       drm_exec_for_each_locked_object(exec, 
>>>>>>>>>>>>>>>>>>>>>> index, obj) {
>>>>>>>>>>>>>>>>>>>>>> +               dma_resv_assert_held(obj->resv);
>>>>>>>>>>>>>>>>>>>>>> +               dma_resv_add_fence(obj->resv, fence,
>>>>>>>>>>>>>>>>>>>>>> + drm_gpuvm_is_extobj(gpuvm,
>>>>>>>>>>>>>>>>>>>>>> obj) ?
>>>>>>>>>>>>>>>>>>>>>> + private_usage :
>>>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_create() - create a new 
>>>>>>>>>>>>>>>>>>>>>> instance of struct
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>>>      * @gpuvm: The &drm_gpuvm the @obj is mapped in.
>>>>>>>>>>>>>>>>>>>>>> @@ -790,6 +1242,9 @@ drm_gpuvm_bo_create(struct 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.gpuva);
>>>>>>>>>>>>>>>>>>>>>>          INIT_LIST_HEAD(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>>>>>> +       INIT_LIST_HEAD(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>          drm_gem_object_get(obj);
>>>>>>>>>>>>>>>>>>>>>>          return vm_bo;
>>>>>>>>>>>>>>>>>>>>>> @@ -807,6 +1262,14 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>>>>>          drm_gem_gpuva_assert_lock_held(vm_bo->obj);
>>>>>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.extobj);
>>>>>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->extobj.lock);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       spin_lock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>>>> +       list_del(&vm_bo->list.entry.evict);
>>>>>>>>>>>>>>>>>>>>>> +       spin_unlock(&gpuvm->evict.lock);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>          list_del(&vm_bo->list.entry.gem);
>>>>>>>>>>>>>>>>>>>>>>          drm_gem_object_put(obj);
>>>>>>>>>>>>>>>>>>>>>> @@ -822,6 +1285,11 @@ drm_gpuvm_bo_destroy(struct 
>>>>>>>>>>>>>>>>>>>>>> kref *kref)
>>>>>>>>>>>>>>>>>>>>>>      * @vm_bo: the &drm_gpuvm_bo to release the 
>>>>>>>>>>>>>>>>>>>>>> reference of
>>>>>>>>>>>>>>>>>>>>>>      *
>>>>>>>>>>>>>>>>>>>>>>      * This releases a reference to @vm_bo.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * If the reference count drops to zero, the 
>>>>>>>>>>>>>>>>>>>>>> &gpuvm_bo is
>>>>>>>>>>>>>>>>>>>>>> destroyed, which
>>>>>>>>>>>>>>>>>>>>>> + * includes removing it from the GEMs gpuva 
>>>>>>>>>>>>>>>>>>>>>> list. Hence, if
>>>>>>>>>>>>>>>>>>>>>> a call to this
>>>>>>>>>>>>>>>>>>>>>> + * function can potentially let the reference 
>>>>>>>>>>>>>>>>>>>>>> count to zero
>>>>>>>>>>>>>>>>>>>>>> the caller must
>>>>>>>>>>>>>>>>>>>>>> + * hold the dma-resv or driver specific GEM 
>>>>>>>>>>>>>>>>>>>>>> gpuva lock.
>>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>>     void
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>>>>>> @@ -831,6 +1299,12 @@ drm_gpuvm_bo_put(struct 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo
>>>>>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
>>>>>>>>>>>>>>>>>>>>>> +static int __must_check
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>>>> *vm_bo)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       return kref_get_unless_zero(&vm_bo->kref);
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>     static struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>>>>> __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> struct drm_gem_object *obj)
>>>>>>>>>>>>>>>>>>>>>> @@ -938,6 +1412,48 @@ 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_obtain_prealloc(struct
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo *__vm_bo)
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_extobj_add() - adds the 
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bo to its
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm's
>>>>>>>>>>>>>>>>>>>>>> + * extobj list
>>>>>>>>>>>>>>>>>>>>>> + * @vm_bo: The &drm_gpuvm_bo to add to its 
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm's the
>>>>>>>>>>>>>>>>>>>>>> extobj list.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Adds the given @vm_bo to its &drm_gpuvm's 
>>>>>>>>>>>>>>>>>>>>>> extobj list if
>>>>>>>>>>>>>>>>>>>>>> not on the list
>>>>>>>>>>>>>>>>>>>>>> + * already and if the corresponding 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>>>>>>> external object,
>>>>>>>>>>>>>>>>>>>>>> + * actually.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *gpuvm = vm_bo->vm;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
>>>>>>>>>>>>>>>>>>>>>> +               drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>>>>>>>>> extobj);
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_bo_evict() - add / remove a 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_object to
>>>>>>>>>>>>>>>>>>>>>> / from a
>>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvms evicted list
>>>>>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to add or remove
>>>>>>>>>>>>>>>>>>>>>> + * @evict: indicates whether the object is evicted
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Adds a &drm_gem_object to or removes it from all
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvms evicted
>>>>>>>>>>>>>>>>>>>>>> + * list containing a mapping of this 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +void
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_bo_evict(struct drm_gem_object *obj, 
>>>>>>>>>>>>>>>>>>>>>> bool evict)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm_bo *vm_bo;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
>>>>>>>>>>>>>>>>>>>>>> +               if (evict)
>>>>>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_add(vm_bo, 
>>>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>>>> +               else
>>>>>>>>>>>>>>>>>>>>>> +                       drm_gpuvm_bo_list_del(vm_bo, 
>>>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>>>> +       }
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>     static int
>>>>>>>>>>>>>>>>>>>>>> __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>>>> diff --git a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>>>> b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>>>> index afa50b9059a2..834bb6d6617e 100644
>>>>>>>>>>>>>>>>>>>>>> --- a/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>>>> +++ b/include/drm/drm_gpuvm.h
>>>>>>>>>>>>>>>>>>>>>> @@ -26,10 +26,12 @@
>>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>>     #include <linux/list.h>
>>>>>>>>>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>>>>>>>>>     #include <linux/rbtree.h>
>>>>>>>>>>>>>>>>>>>>>>     #include <linux/types.h>
>>>>>>>>>>>>>>>>>>>>>>     #include <drm/drm_gem.h>
>>>>>>>>>>>>>>>>>>>>>> +#include <drm/drm_exec.h>
>>>>>>>>>>>>>>>>>>>>>>     struct drm_gpuvm;
>>>>>>>>>>>>>>>>>>>>>>     struct drm_gpuvm_bo;
>>>>>>>>>>>>>>>>>>>>>> @@ -259,6 +261,38 @@ struct drm_gpuvm {
>>>>>>>>>>>>>>>>>>>>>>           * space
>>>>>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>>>>>          struct dma_resv *resv;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>>> +        * @extobj: structure holding the extobj 
>>>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>>>>>> serving as
>>>>>>>>>>>>>>>>>>>>>> +                * external object
>>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>>>>>> extobj list
>>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>>>>>> +       } extobj;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>>> +        * @evict: structure holding the evict 
>>>>>>>>>>>>>>>>>>>>>> list and evict
>>>>>>>>>>>>>>>>>>>>>> list lock
>>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>>> +                * @list: &list_head storing 
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm_bos
>>>>>>>>>>>>>>>>>>>>>> currently being
>>>>>>>>>>>>>>>>>>>>>> +                * evicted
>>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>>> +               struct list_head list;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>>> +                * @lock: spinlock to protect the 
>>>>>>>>>>>>>>>>>>>>>> evict list
>>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>>> +               spinlock_t lock;
>>>>>>>>>>>>>>>>>>>>>> +       } evict;
>>>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>>>>     void drm_gpuvm_init(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>>>> struct
>>>>>>>>>>>>>>>>>>>>>> drm_device *drm,
>>>>>>>>>>>>>>>>>>>>>> @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> *gpuvm, struct drm_device *drm,
>>>>>>>>>>>>>>>>>>>>>> const struct drm_gpuvm_ops *ops);
>>>>>>>>>>>>>>>>>>>>>>     void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_is_extobj() - indicates whether the 
>>>>>>>>>>>>>>>>>>>>>> given
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_object is an
>>>>>>>>>>>>>>>>>>>>>> + * external object
>>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm to check
>>>>>>>>>>>>>>>>>>>>>> + * @obj: the &drm_gem_object to check
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: true if the &drm_gem_object 
>>>>>>>>>>>>>>>>>>>>>> &dma_resv differs
>>>>>>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>>>>>>> + * &drm_gpuvms &dma_resv, false otherwise
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +static inline bool drm_gpuvm_is_extobj(struct 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object
>>>>>>>>>>>>>>>>>>>>>> *obj)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       return obj && obj->resv != gpuvm->resv;
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>     static inline struct drm_gpuva *
>>>>>>>>>>>>>>>>>>>>>> __drm_gpuva_next(struct drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>> @@ -346,6 +395,128 @@ __drm_gpuva_next(struct 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuva *va)
>>>>>>>>>>>>>>>>>>>>>>     #define drm_gpuvm_for_each_va_safe(va__, 
>>>>>>>>>>>>>>>>>>>>>> next__, gpuvm__)
>>>>>>>>>>>>>>>>>>>>>> \
>>>>>>>>>>>>>>>>>>>>>>          list_for_each_entry_safe(va__, next__, 
>>>>>>>>>>>>>>>>>>>>>> &(gpuvm__)-
>>>>>>>>>>>>>>>>>>>>>>> rb.list, rb.entry)
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * struct drm_gpuvm_exec - &drm_gpuvm 
>>>>>>>>>>>>>>>>>>>>>> abstraction of
>>>>>>>>>>>>>>>>>>>>>> &drm_exec
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * This structure should be created on the stack as
>>>>>>>>>>>>>>>>>>>>>> &drm_exec should be.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Optionally, @extra can be set in order to 
>>>>>>>>>>>>>>>>>>>>>> lock additional
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +struct drm_gpuvm_exec {
>>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>>> +        * @exec: the &drm_exec structure
>>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_exec exec;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>>> +        * @vm: the &drm_gpuvm to lock its DMA 
>>>>>>>>>>>>>>>>>>>>>> reservations
>>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>>> +       struct drm_gpuvm *vm;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>>> +        * @extra: Callback and corresponding 
>>>>>>>>>>>>>>>>>>>>>> private data
>>>>>>>>>>>>>>>>>>>>>> for the driver to
>>>>>>>>>>>>>>>>>>>>>> +        * lock arbitrary additional 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>>> +       struct {
>>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>>> +                * @fn: The driver callback to lock
>>>>>>>>>>>>>>>>>>>>>> additional &drm_gem_objects.
>>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>>> +               int (*fn)(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>>> +                * @priv: driver private data for 
>>>>>>>>>>>>>>>>>>>>>> the @fn
>>>>>>>>>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>>>>>>>> +                */
>>>>>>>>>>>>>>>>>>>>>> +               void *priv;
>>>>>>>>>>>>>>>>>>>>>> +       } extra;
>>>>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_prepare_vm() - prepare the GPUVMs 
>>>>>>>>>>>>>>>>>>>>>> common dma-
>>>>>>>>>>>>>>>>>>>>>> resv
>>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> + * @exec: the &drm_exec context
>>>>>>>>>>>>>>>>>>>>>> + * @num_fences: the amount of &dma_fences to 
>>>>>>>>>>>>>>>>>>>>>> reserve
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Calls drm_exec_prepare_obj() for the GPUVMs 
>>>>>>>>>>>>>>>>>>>>>> dummy
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_object.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Using this function directly, it is the drivers
>>>>>>>>>>>>>>>>>>>>>> responsibility to call
>>>>>>>>>>>>>>>>>>>>>> + * drm_exec_init() and drm_exec_fini() accordingly.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +static inline int
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       return drm_exec_prepare_obj(exec, 
>>>>>>>>>>>>>>>>>>>>>> &gpuvm->d_obj,
>>>>>>>>>>>>>>>>>>>>>> num_fences);
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_objects(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_prepare_range(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>>> +                       unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>>> +                       bool interruptible);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_gem_object **objs,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_objs,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>>> + u64 addr, u64 range,
>>>>>>>>>>>>>>>>>>>>>> + unsigned int num_fences,
>>>>>>>>>>>>>>>>>>>>>> + bool interruptible);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_lock() - lock all dma-resv of all 
>>>>>>>>>>>>>>>>>>>>>> assoiciated
>>>>>>>>>>>>>>>>>>>>>> BOs
>>>>>>>>>>>>>>>>>>>>>> + * @gpuvm: the &drm_gpuvm
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Releases all dma-resv locks of all 
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_objects
>>>>>>>>>>>>>>>>>>>>>> previously acquired
>>>>>>>>>>>>>>>>>>>>>> + * through drm_gpuvm_lock() or its variants.
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * Returns: 0 on success, negative error code on 
>>>>>>>>>>>>>>>>>>>>>> failure.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_unlock(struct drm_gpuvm_exec 
>>>>>>>>>>>>>>>>>>>>>> *vm_exec)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       drm_exec_fini(&vm_exec->exec);
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
>>>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_resv_add_fence(struct drm_gpuvm 
>>>>>>>>>>>>>>>>>>>>>> *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> + struct drm_exec *exec,
>>>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>>>>> + * drm_gpuvm_exec_resv_add_fence()
>>>>>>>>>>>>>>>>>>>>>> + * @vm_exec: the &drm_gpuvm_exec abstraction
>>>>>>>>>>>>>>>>>>>>>> + * @fence: fence to add
>>>>>>>>>>>>>>>>>>>>>> + * @private_usage: private dma-resv usage
>>>>>>>>>>>>>>>>>>>>>> + * @extobj_usage: extobj dma-resv usage
>>>>>>>>>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>>>>>>>>> + * See drm_gpuvm_resv_add_fence().
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>>>>> +drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec
>>>>>>>>>>>>>>>>>>>>>> *vm_exec,
>>>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence,
>>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>>>> private_usage,
>>>>>>>>>>>>>>>>>>>>>> + enum dma_resv_usage
>>>>>>>>>>>>>>>>>>>>>> extobj_usage)
>>>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>>>> +       drm_gpuvm_resv_add_fence(vm_exec->vm, 
>>>>>>>>>>>>>>>>>>>>>> &vm_exec->exec,
>>>>>>>>>>>>>>>>>>>>>> fence,
>>>>>>>>>>>>>>>>>>>>>> + private_usage,
>>>>>>>>>>>>>>>>>>>>>> extobj_usage);
>>>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>>>      * struct drm_gpuvm_bo - structure 
>>>>>>>>>>>>>>>>>>>>>> representing a
>>>>>>>>>>>>>>>>>>>>>> &drm_gpuvm and
>>>>>>>>>>>>>>>>>>>>>>      * &drm_gem_object combination
>>>>>>>>>>>>>>>>>>>>>> @@ -398,6 +569,18 @@ struct drm_gpuvm_bo {
>>>>>>>>>>>>>>>>>>>>>> * gpuva list.
>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>                          struct list_head gem;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>>>>>> + * @evict: List entry to attach to
>>>>>>>>>>>>>>>>>>>>>> the &drm_gpuvms
>>>>>>>>>>>>>>>>>>>>>> + * extobj list.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +                       struct list_head extobj;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +                       /**
>>>>>>>>>>>>>>>>>>>>>> + * @evict: List entry to attach to
>>>>>>>>>>>>>>>>>>>>>> the &drm_gpuvms evict
>>>>>>>>>>>>>>>>>>>>>> + * list.
>>>>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>>>>> +                       struct list_head evict;
>>>>>>>>>>>>>>>>>>>>>>                  } entry;
>>>>>>>>>>>>>>>>>>>>>>          } list;
>>>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>>>> @@ -432,6 +615,9 @@ struct drm_gpuvm_bo *
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
>>>>>>>>>>>>>>>>>>>>>> struct drm_gem_object *obj);
>>>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_evict(struct drm_gem_object 
>>>>>>>>>>>>>>>>>>>>>> *obj, bool
>>>>>>>>>>>>>>>>>>>>>> evict);
>>>>>>>>>>>>>>>>>>>>>> +void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo 
>>>>>>>>>>>>>>>>>>>>>> *vm_bo);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>     /**
>>>>>>>>>>>>>>>>>>>>>>      * drm_gpuvm_bo_for_each_va() - iterator to 
>>>>>>>>>>>>>>>>>>>>>> walk over a
>>>>>>>>>>>>>>>>>>>>>> list of &drm_gpuva
>>>>>>>>>>>>>>>>>>>>>>      * @va__: &drm_gpuva structure to assign to 
>>>>>>>>>>>>>>>>>>>>>> in each
>>>>>>>>>>>>>>>>>>>>>> iteration step
>>>>>>>>>>>>>>>>>>>>>> @@ -837,6 +1023,17 @@ struct drm_gpuvm_ops {
>>>>>>>>>>>>>>>>>>>>>>           * used.
>>>>>>>>>>>>>>>>>>>>>>           */
>>>>>>>>>>>>>>>>>>>>>>          int (*sm_step_unmap)(struct drm_gpuva_op 
>>>>>>>>>>>>>>>>>>>>>> *op, void
>>>>>>>>>>>>>>>>>>>>>> *priv);
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +       /**
>>>>>>>>>>>>>>>>>>>>>> +        * @bo_validate: called from 
>>>>>>>>>>>>>>>>>>>>>> drm_gpuvm_validate()
>>>>>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>>>>>> +        * Drivers receive this callback for 
>>>>>>>>>>>>>>>>>>>>>> every evicted
>>>>>>>>>>>>>>>>>>>>>> &drm_gem_object being
>>>>>>>>>>>>>>>>>>>>>> +        * mapped in the corresponding &drm_gpuvm.
>>>>>>>>>>>>>>>>>>>>>> +        *
>>>>>>>>>>>>>>>>>>>>>> +        * Typically, drivers would call their 
>>>>>>>>>>>>>>>>>>>>>> driver
>>>>>>>>>>>>>>>>>>>>>> specific variant of
>>>>>>>>>>>>>>>>>>>>>> +        * ttm_bo_validate() from within this 
>>>>>>>>>>>>>>>>>>>>>> callback.
>>>>>>>>>>>>>>>>>>>>>> +        */
>>>>>>>>>>>>>>>>>>>>>> +       int (*bo_validate)(struct drm_gem_object 
>>>>>>>>>>>>>>>>>>>>>> *obj);
>>>>>>>>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>>>>>>>>     int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, 
>>>>>>>>>>>>>>>>>>>>>> void *priv,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
Christian König Sept. 20, 2023, 2:11 p.m. UTC | #58
Am 20.09.23 um 16:02 schrieb Thomas Hellström:
> [SNIP]
>>> Do you by "relocation" list refer to what gpuvm calls "evict" list 
>>> or something else? Like the relocaton/validation list that used to 
>>> be sent from user-space for non-VM_BIND vms?
>>
>> The BOs send into the kernel with each command submission on the 
>> classic IOCTLs.
>>
>>>
>>> The vm bos plus the external/shared bos bound to the VM (the 
>>> external list) are the bos being referenced by the current batch. So 
>>> the bos on the VM's external list are the ones being locked and 
>>> fenced and checked for eviction. If they weren't they could be 
>>> evicted before the current batch completes?
>>
>> That only applies to a certain use case, e.g. Vulkan or user mode 
>> queues.
>>
>> Multimedia APIs and especially OpenGL work differently, here only the 
>> BOs mentioned in the relocation list are guaranteed to not be evicted.
>>
>> This is intentional because those APIs tend to over allocate memory 
>> all the time, so for good performance you need to be able to evict 
>> BOs from the VM while other parts of the VM are currently in use.
>>
>> Without that especially OpenGL performance would be completely 
>> crippled at least on amdgpu.
>
> OK, I've always wondered how overcommiting a local VM would be handled 
> on VM_BIND, where we don't have the relocation list, at least not in 
> xe, so we have what you refer to as the user mode queues.
>
> I figure those APIs that suffer from overcommitting would maintain a 
> "current working set" in user-space and send changes as deltas to the 
> kernel as unbinds/binds. Or at least "can be unbound / can no longer 
> be unbound" advises.
>
> This may turn out interesting.

Essentially this is how Windows used to work till (I think) Windows 8.

Basically the kernel is responsible to figure out which BO to move 
in/out of VRAM for each submission an application does. And it is 
perfectly acceptable for an application to allocate 8GiB of VRAM when 
only 4GiB is physical available.

To be honest I think it's one of the worst things every invented, but we 
somehow have to support it for some use cases.

Christian.

>
> /Thomas
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index f4411047dbb3..8e62a043f719 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -73,6 +73,21 @@ 
  * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
  * particular combination. If not existent a new instance is created and linked
  * to the &drm_gem_object.
+ *
+ * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
+ * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
+ * list are maintained in order to accelerate locking of dma-resv locks and
+ * validation of evicted objects bound in a &drm_gpuvm. For instance the all
+ * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
+ * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
+ * order to validate all evicted &drm_gem_objects. It is also possible to lock
+ * additional &drm_gem_objects by providing the corresponding parameters to
+ * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
+ * use of helper functions such as drm_gpuvm_prepare_range() or
+ * drm_gpuvm_prepare_objects().
+ *
+ * Every bound &drm_gem_object is treated as external object when its &dma_resv
+ * structure is different than the &drm_gpuvm's common &dma_resv structure.
  */
 
 /**
@@ -420,6 +435,20 @@ 
  * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
  * &drm_gem_object must be able to observe previous creations and destructions
  * of &drm_gpuvm_bos in order to keep instances unique.
+ *
+ * The &drm_gpuvm's lists for keeping track of external and evicted objects are
+ * protected against concurrent insertion / removal and iteration internally.
+ *
+ * However, drivers still need ensure to protect concurrent calls to functions
+ * iterating those lists, such as drm_gpuvm_validate() and
+ * drm_gpuvm_prepare_objects(). Every such function contains a particular
+ * comment and lockdep checks if possible.
+ *
+ * Functions adding or removing entries from those lists, such as
+ * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with external
+ * locks being held, e.g. in order to avoid the corresponding list to be
+ * (safely) modified while potentially being iternated by other API functions.
+ * However, this is entirely optional.
  */
 
 /**
@@ -632,6 +661,131 @@ 
  *	}
  */
 
+/**
+ * get_next_vm_bo_from_list() - get the next vm_bo element
+ * @__gpuvm: The GPU VM
+ * @__list_name: The name of the list we're iterating on
+ * @__local_list: A pointer to the local list used to store already iterated items
+ * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo()
+ *
+ * This helper is here to provide lockless list iteration. Lockless as in, the
+ * iterator releases the lock immediately after picking the first element from
+ * the list, so list insertion deletion can happen concurrently.
+ *
+ * Elements popped from the original list are kept in a local list, so removal
+ * and is_empty checks can still happen while we're iterating the list.
+ */
+#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo)	\
+	({										\
+		struct drm_gpuvm_bo *__vm_bo;						\
+											\
+		drm_gpuvm_bo_put(__prev_vm_bo);						\
+											\
+		spin_lock(&(__gpuvm)->__list_name.lock);				\
+		while (!list_empty(&(__gpuvm)->__list_name.list)) {			\
+			__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list,	\
+						   struct drm_gpuvm_bo,			\
+						   list.entry.__list_name);		\
+			if (drm_gpuvm_bo_get_unless_zero(__vm_bo)) {			\
+				list_move_tail(&(__vm_bo)->list.entry.__list_name,	\
+					       __local_list);				\
+				break;							\
+			} else {							\
+				list_del_init(&(__vm_bo)->list.entry.__list_name);	\
+				__vm_bo = NULL;						\
+			}								\
+		}									\
+		spin_unlock(&(__gpuvm)->__list_name.lock);				\
+											\
+		__vm_bo;								\
+	})
+
+/**
+ * for_each_vm_bo_in_list() - internal vm_bo list iterator
+ *
+ * This helper is here to provide lockless list iteration. Lockless as in, the
+ * iterator releases the lock immediately after picking the first element from the
+ * list, so list insertion and deletion can happen concurrently.
+ *
+ * Typical use:
+ *
+ *	struct drm_gpuvm_bo *vm_bo;
+ *	LIST_HEAD(my_local_list);
+ *
+ *	ret = 0;
+ *	drm_gpuvm_for_each_vm_bo(gpuvm, <list_name>, &my_local_list, vm_bo) {
+ *		ret = do_something_with_vm_bo(..., vm_bo);
+ *		if (ret)
+ *			break;
+ *	}
+ *	drm_gpuvm_bo_put(vm_bo);
+ *	drm_gpuvm_restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
+ *
+ *
+ * Only used for internal list iterations, not meant to be exposed to the outside
+ * world.
+ */
+#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo)	\
+	for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
+						__local_list, NULL);		\
+	     __vm_bo;								\
+	     __vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name,		\
+						__local_list, __vm_bo))		\
+
+/**
+ * restore_vm_bo_list() - move vm_bo elements back to their original list
+ * @__gpuvm: The GPU VM
+ * @__list_name: The name of the list we're iterating on
+ * @__local_list: A pointer to the local list used to store already iterated items
+ *
+ * When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
+ * to restore the original state and let new iterations take place.
+ */
+#define restore_vm_bo_list(__gpuvm, __list_name, __local_list)				\
+	do {										\
+		/* Merge back the two lists, moving local list elements to the		\
+		 * head to preserve previous ordering, in case it matters.		\
+		 */									\
+		spin_lock(&(__gpuvm)->__list_name.lock);				\
+		list_splice(__local_list, &(__gpuvm)->__list_name.list);		\
+		spin_unlock(&(__gpuvm)->__list_name.lock);				\
+	} while (0)
+/**
+ * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
+ * @__vm_bo: the &drm_gpuvm_bo
+ * @__list_name: the name of the list to insert into
+ *
+ * Inserts the given @__vm_bo into the list specified by @__list_name and
+ * increases the vm_bo's reference count.
+ */
+#define drm_gpuvm_bo_list_add(__vm_bo, __list_name)				\
+	do {									\
+		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
+		if (list_empty(&(__vm_bo)->list.entry.__list_name))		\
+			list_add_tail(&(__vm_bo)->list.entry.__list_name,	\
+				      &(__vm_bo)->vm->__list_name.list);	\
+		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
+	} while (0)
+
+/**
+ * drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
+ * @__vm_bo: the &drm_gpuvm_bo
+ * @__list_name: the name of the list to insert into
+ *
+ * Removes the given @__vm_bo from the list specified by @__list_name and
+ * decreases the vm_bo's reference count.
+ */
+#define drm_gpuvm_bo_list_del(__vm_bo, __list_name)				\
+	do {									\
+		spin_lock(&(__vm_bo)->vm->__list_name.lock);			\
+		if (!list_empty(&(__vm_bo)->list.entry.__list_name))		\
+			list_del_init(&(__vm_bo)->list.entry.__list_name);	\
+		spin_unlock(&(__vm_bo)->vm->__list_name.lock);			\
+	} while (0)
+
+static int __must_check
+drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo);
+
 #define to_drm_gpuva(__node)	container_of((__node), struct drm_gpuva, rb.node)
 
 #define GPUVA_START(node) ((node)->va.addr)
@@ -713,6 +867,12 @@  drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
 	gpuvm->rb.tree = RB_ROOT_CACHED;
 	INIT_LIST_HEAD(&gpuvm->rb.list);
 
+	INIT_LIST_HEAD(&gpuvm->extobj.list);
+	spin_lock_init(&gpuvm->extobj.lock);
+
+	INIT_LIST_HEAD(&gpuvm->evict.list);
+	spin_lock_init(&gpuvm->evict.lock);
+
 	drm_gpuva_check_overflow(start_offset, range);
 	gpuvm->mm_start = start_offset;
 	gpuvm->mm_range = range;
@@ -754,10 +914,302 @@  drm_gpuvm_destroy(struct drm_gpuvm *gpuvm)
 	WARN(!RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
 	     "GPUVA tree is not empty, potentially leaking memory.\n");
 
+	WARN(!list_empty(&gpuvm->extobj.list), "Extobj list should be empty.\n");
+	WARN(!list_empty(&gpuvm->evict.list), "Evict list should be empty.\n");
+
 	drm_gem_private_object_fini(&gpuvm->d_obj);
 }
 EXPORT_SYMBOL_GPL(drm_gpuvm_destroy);
 
+/**
+ * drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec locking context
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
+ * &drm_gpuvm contains mappings of.
+ *
+ * Using this function directly, it is the drivers responsibility to call
+ * drm_exec_init() and drm_exec_fini() accordingly.
+ *
+ * Note: This function is safe against concurrent insertion and removal of
+ * external objects, however it is not safe against concurrent usage itself.
+ *
+ * Drivers need to make sure to protect this case with either an outer VM lock
+ * or by calling drm_gpuvm_prepare_vm() before this function within the
+ * drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
+ * mutual exclusion.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
+			  struct drm_exec *exec,
+			  unsigned int num_fences)
+{
+	struct drm_gpuvm_bo *vm_bo;
+	LIST_HEAD(extobjs);
+	int ret = 0;
+
+	for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
+		ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
+		if (ret)
+			break;
+	}
+	/* Drop ref in case we break out of the loop. */
+	drm_gpuvm_bo_put(vm_bo);
+	restore_vm_bo_list(gpuvm, extobj, &extobjs);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
+
+/**
+ * drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec locking context
+ * @addr: the start address within the VA space
+ * @range: the range to iterate within the VA space
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
+ * and @addr + @range.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
+			u64 addr, u64 range, unsigned int num_fences)
+{
+	struct drm_gpuva *va;
+	u64 end = addr + range;
+	int ret;
+
+	drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
+		struct drm_gem_object *obj = va->gem.obj;
+
+		ret = drm_exec_prepare_obj(exec, obj, num_fences);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
+
+/**
+ * drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ * @num_fences: the amount of &dma_fences to reserve
+ * @interruptible: sleep interruptible if waiting
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects the given
+ * &drm_gpuvm contains mappings of.
+ *
+ * Addionally, when calling this function with struct drm_gpuvm_exec::extra
+ * being set the driver receives the given @fn callback to lock additional
+ * dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
+ * would call drm_exec_prepare_obj() from within this callback.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
+		    unsigned int num_fences,
+		    bool interruptible)
+{
+	struct drm_gpuvm *gpuvm = vm_exec->vm;
+	struct drm_exec *exec = &vm_exec->exec;
+	uint32_t flags;
+	int ret;
+
+	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
+		DRM_EXEC_IGNORE_DUPLICATES;
+
+	drm_exec_init(exec, flags);
+
+	drm_exec_until_all_locked(exec) {
+		ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
+		drm_exec_retry_on_contention(exec);
+		if (ret)
+			goto err;
+
+		ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
+		drm_exec_retry_on_contention(exec);
+		if (ret)
+			goto err;
+
+		if (vm_exec->extra.fn) {
+			ret = vm_exec->extra.fn(vm_exec, num_fences);
+			drm_exec_retry_on_contention(exec);
+			if (ret)
+				goto err;
+		}
+	}
+
+	return 0;
+
+err:
+	drm_exec_fini(exec);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
+
+static int
+fn_lock_array(struct drm_gpuvm_exec *vm_exec, unsigned int num_fences)
+{
+	struct {
+		struct drm_gem_object **objs;
+		unsigned int num_objs;
+	} *args = vm_exec->extra.priv;
+
+	return drm_exec_prepare_array(&vm_exec->exec, args->objs,
+				      args->num_objs, num_fences);
+}
+
+/**
+ * drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ * @objs: additional &drm_gem_objects to lock
+ * @num_objs: the number of additional &drm_gem_objects to lock
+ * @num_fences: the amount of &dma_fences to reserve
+ * @interruptible: sleep interruptible if waiting
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
+ * contains mappings of, plus the ones given through @objs.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
+			  struct drm_gem_object **objs,
+			  unsigned int num_objs,
+			  unsigned int num_fences,
+			  bool interruptible)
+{
+	struct {
+		struct drm_gem_object **objs;
+		unsigned int num_objs;
+	} args;
+
+	args.objs = objs;
+	args.num_objs = num_objs;
+
+	vm_exec->extra.fn = fn_lock_array;
+	vm_exec->extra.priv = &args;
+
+	return drm_gpuvm_exec_lock(vm_exec, num_fences, interruptible);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
+
+/**
+ * drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ * @addr: the start address within the VA space
+ * @range: the range to iterate within the VA space
+ * @num_fences: the amount of &dma_fences to reserve
+ * @interruptible: sleep interruptible if waiting
+ *
+ * Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
+ * @addr + @range.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
+			  u64 addr, u64 range,
+			  unsigned int num_fences,
+			  bool interruptible)
+{
+	struct drm_gpuvm *gpuvm = vm_exec->vm;
+	struct drm_exec *exec = &vm_exec->exec;
+	uint32_t flags;
+	int ret;
+
+	flags = interruptible ? DRM_EXEC_INTERRUPTIBLE_WAIT : 0 |
+		DRM_EXEC_IGNORE_DUPLICATES;
+
+	drm_exec_init(exec, flags);
+
+	drm_exec_until_all_locked(exec) {
+		ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
+					      num_fences);
+		drm_exec_retry_on_contention(exec);
+		if (ret)
+			goto err;
+	}
+
+	return ret;
+
+err:
+	drm_exec_fini(exec);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
+
+/**
+ * drm_gpuvm_validate() - validate all BOs marked as evicted
+ * @gpuvm: the &drm_gpuvm to validate evicted BOs
+ *
+ * Calls the &drm_gpuvm_ops.bo_validate callback for all evicted buffer
+ * objects being mapped in the given &drm_gpuvm.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuvm_validate(struct drm_gpuvm *gpuvm)
+{
+	const struct drm_gpuvm_ops *ops = gpuvm->ops;
+	struct drm_gpuvm_bo *vm_bo;
+	LIST_HEAD(evict);
+	int ret = 0;
+
+	if (unlikely(!ops || !ops->bo_validate))
+		return -ENOTSUPP;
+
+	for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
+		dma_resv_assert_held(vm_bo->obj->resv);
+		ret = ops->bo_validate(vm_bo->obj);
+		if (ret)
+			break;
+	}
+	/* Drop ref in case we break out of the loop. */
+	drm_gpuvm_bo_put(vm_bo);
+	restore_vm_bo_list(gpuvm, evict, &evict);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
+
+/**
+ * drm_gpuvm_resv_add_fence - add fence to private and all extobj
+ * dma-resv
+ * @gpuvm: the &drm_gpuvm to add a fence to
+ * @exec: the &drm_exec locking context
+ * @fence: fence to add
+ * @private_usage: private dma-resv usage
+ * @extobj_usage: extobj dma-resv usage
+ */
+void
+drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
+			 struct drm_exec *exec,
+			 struct dma_fence *fence,
+			 enum dma_resv_usage private_usage,
+			 enum dma_resv_usage extobj_usage)
+{
+	struct drm_gem_object *obj;
+	unsigned long index;
+
+	drm_exec_for_each_locked_object(exec, index, obj) {
+		dma_resv_assert_held(obj->resv);
+		dma_resv_add_fence(obj->resv, fence,
+				   drm_gpuvm_is_extobj(gpuvm, obj) ?
+				   private_usage : extobj_usage);
+	}
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
+
 /**
  * drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
  * @gpuvm: The &drm_gpuvm the @obj is mapped in.
@@ -790,6 +1242,9 @@  drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
 	INIT_LIST_HEAD(&vm_bo->list.gpuva);
 	INIT_LIST_HEAD(&vm_bo->list.entry.gem);
 
+	INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
+	INIT_LIST_HEAD(&vm_bo->list.entry.evict);
+
 	drm_gem_object_get(obj);
 
 	return vm_bo;
@@ -807,6 +1262,14 @@  drm_gpuvm_bo_destroy(struct kref *kref)
 
 	drm_gem_gpuva_assert_lock_held(vm_bo->obj);
 
+	spin_lock(&gpuvm->extobj.lock);
+	list_del(&vm_bo->list.entry.extobj);
+	spin_unlock(&gpuvm->extobj.lock);
+
+	spin_lock(&gpuvm->evict.lock);
+	list_del(&vm_bo->list.entry.evict);
+	spin_unlock(&gpuvm->evict.lock);
+
 	list_del(&vm_bo->list.entry.gem);
 
 	drm_gem_object_put(obj);
@@ -822,6 +1285,11 @@  drm_gpuvm_bo_destroy(struct kref *kref)
  * @vm_bo: the &drm_gpuvm_bo to release the reference of
  *
  * This releases a reference to @vm_bo.
+ *
+ * If the reference count drops to zero, the &gpuvm_bo is destroyed, which
+ * includes removing it from the GEMs gpuva list. Hence, if a call to this
+ * function can potentially let the reference count to zero the caller must
+ * hold the dma-resv or driver specific GEM gpuva lock.
  */
 void
 drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
@@ -831,6 +1299,12 @@  drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
 }
 EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
 
+static int __must_check
+drm_gpuvm_bo_get_unless_zero(struct drm_gpuvm_bo *vm_bo)
+{
+	return kref_get_unless_zero(&vm_bo->kref);
+}
+
 static struct drm_gpuvm_bo *
 __drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
 		    struct drm_gem_object *obj)
@@ -938,6 +1412,48 @@  drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
 }
 EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
 
+/**
+ * drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
+ * extobj list
+ * @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
+ *
+ * Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
+ * already and if the corresponding &drm_gem_object is an external object,
+ * actually.
+ */
+void
+drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
+{
+	struct drm_gpuvm *gpuvm = vm_bo->vm;
+
+	if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
+		drm_gpuvm_bo_list_add(vm_bo, extobj);
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
+
+/**
+ * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a
+ * &drm_gpuvms evicted list
+ * @obj: the &drm_gem_object to add or remove
+ * @evict: indicates whether the object is evicted
+ *
+ * Adds a &drm_gem_object to or removes it from all &drm_gpuvms evicted
+ * list containing a mapping of this &drm_gem_object.
+ */
+void
+drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
+{
+	struct drm_gpuvm_bo *vm_bo;
+
+	drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
+		if (evict)
+			drm_gpuvm_bo_list_add(vm_bo, evict);
+		else
+			drm_gpuvm_bo_list_del(vm_bo, evict);
+	}
+}
+EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
+
 static int
 __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
 		   struct drm_gpuva *va)
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index afa50b9059a2..834bb6d6617e 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -26,10 +26,12 @@ 
  */
 
 #include <linux/list.h>
+#include <linux/dma-resv.h>
 #include <linux/rbtree.h>
 #include <linux/types.h>
 
 #include <drm/drm_gem.h>
+#include <drm/drm_exec.h>
 
 struct drm_gpuvm;
 struct drm_gpuvm_bo;
@@ -259,6 +261,38 @@  struct drm_gpuvm {
 	 * space
 	 */
 	struct dma_resv *resv;
+
+	/**
+	 * @extobj: structure holding the extobj list
+	 */
+	struct {
+		/**
+		 * @list: &list_head storing &drm_gpuvm_bos serving as
+		 * external object
+		 */
+		struct list_head list;
+
+		/**
+		 * @lock: spinlock to protect the extobj list
+		 */
+		spinlock_t lock;
+	} extobj;
+
+	/**
+	 * @evict: structure holding the evict list and evict list lock
+	 */
+	struct {
+		/**
+		 * @list: &list_head storing &drm_gpuvm_bos currently being
+		 * evicted
+		 */
+		struct list_head list;
+
+		/**
+		 * @lock: spinlock to protect the evict list
+		 */
+		spinlock_t lock;
+	} evict;
 };
 
 void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
@@ -268,6 +302,21 @@  void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm,
 		    const struct drm_gpuvm_ops *ops);
 void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm);
 
+/**
+ * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an
+ * external object
+ * @gpuvm: the &drm_gpuvm to check
+ * @obj: the &drm_gem_object to check
+ *
+ * Returns: true if the &drm_gem_object &dma_resv differs from the
+ * &drm_gpuvms &dma_resv, false otherwise
+ */
+static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm,
+				       struct drm_gem_object *obj)
+{
+	return obj && obj->resv != gpuvm->resv;
+}
+
 static inline struct drm_gpuva *
 __drm_gpuva_next(struct drm_gpuva *va)
 {
@@ -346,6 +395,128 @@  __drm_gpuva_next(struct drm_gpuva *va)
 #define drm_gpuvm_for_each_va_safe(va__, next__, gpuvm__) \
 	list_for_each_entry_safe(va__, next__, &(gpuvm__)->rb.list, rb.entry)
 
+/**
+ * struct drm_gpuvm_exec - &drm_gpuvm abstraction of &drm_exec
+ *
+ * This structure should be created on the stack as &drm_exec should be.
+ *
+ * Optionally, @extra can be set in order to lock additional &drm_gem_objects.
+ */
+struct drm_gpuvm_exec {
+	/**
+	 * @exec: the &drm_exec structure
+	 */
+	struct drm_exec exec;
+
+	/**
+	 * @vm: the &drm_gpuvm to lock its DMA reservations
+	 */
+	struct drm_gpuvm *vm;
+
+	/**
+	 * @extra: Callback and corresponding private data for the driver to
+	 * lock arbitrary additional &drm_gem_objects.
+	 */
+	struct {
+		/**
+		 * @fn: The driver callback to lock additional &drm_gem_objects.
+		 */
+		int (*fn)(struct drm_gpuvm_exec *vm_exec,
+			  unsigned int num_fences);
+
+		/**
+		 * @priv: driver private data for the @fn callback
+		 */
+		void *priv;
+	} extra;
+};
+
+/**
+ * drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
+ * @gpuvm: the &drm_gpuvm
+ * @exec: the &drm_exec context
+ * @num_fences: the amount of &dma_fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object.
+ *
+ * Using this function directly, it is the drivers responsibility to call
+ * drm_exec_init() and drm_exec_fini() accordingly.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static inline int
+drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
+		     struct drm_exec *exec,
+		     unsigned int num_fences)
+{
+	return drm_exec_prepare_obj(exec, &gpuvm->d_obj, num_fences);
+}
+
+int drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
+			      struct drm_exec *exec,
+			      unsigned int num_fences);
+
+int drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm,
+			    struct drm_exec *exec,
+			    u64 addr, u64 range,
+			    unsigned int num_fences);
+
+int drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec,
+			unsigned int num_fences,
+			bool interruptible);
+
+int drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
+			      struct drm_gem_object **objs,
+			      unsigned int num_objs,
+			      unsigned int num_fences,
+			      bool interruptible);
+
+int drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
+			      u64 addr, u64 range,
+			      unsigned int num_fences,
+			      bool interruptible);
+
+/**
+ * drm_gpuvm_lock() - lock all dma-resv of all assoiciated BOs
+ * @gpuvm: the &drm_gpuvm
+ *
+ * Releases all dma-resv locks of all &drm_gem_objects previously acquired
+ * through drm_gpuvm_lock() or its variants.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+static inline void
+drm_gpuvm_exec_unlock(struct drm_gpuvm_exec *vm_exec)
+{
+	drm_exec_fini(&vm_exec->exec);
+}
+
+int drm_gpuvm_validate(struct drm_gpuvm *gpuvm);
+void drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
+			      struct drm_exec *exec,
+			      struct dma_fence *fence,
+			      enum dma_resv_usage private_usage,
+			      enum dma_resv_usage extobj_usage);
+
+/**
+ * drm_gpuvm_exec_resv_add_fence()
+ * @vm_exec: the &drm_gpuvm_exec abstraction
+ * @fence: fence to add
+ * @private_usage: private dma-resv usage
+ * @extobj_usage: extobj dma-resv usage
+ *
+ * See drm_gpuvm_resv_add_fence().
+ */
+static inline void
+drm_gpuvm_exec_resv_add_fence(struct drm_gpuvm_exec *vm_exec,
+			      struct dma_fence *fence,
+			      enum dma_resv_usage private_usage,
+			      enum dma_resv_usage extobj_usage)
+{
+	drm_gpuvm_resv_add_fence(vm_exec->vm, &vm_exec->exec, fence,
+				 private_usage, extobj_usage);
+}
+
 /**
  * struct drm_gpuvm_bo - structure representing a &drm_gpuvm and
  * &drm_gem_object combination
@@ -398,6 +569,18 @@  struct drm_gpuvm_bo {
 			 * gpuva list.
 			 */
 			struct list_head gem;
+
+			/**
+			 * @evict: List entry to attach to the &drm_gpuvms
+			 * extobj list.
+			 */
+			struct list_head extobj;
+
+			/**
+			 * @evict: List entry to attach to the &drm_gpuvms evict
+			 * list.
+			 */
+			struct list_head evict;
 		} entry;
 	} list;
 };
@@ -432,6 +615,9 @@  struct drm_gpuvm_bo *
 drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
 		  struct drm_gem_object *obj);
 
+void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
+void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
+
 /**
  * drm_gpuvm_bo_for_each_va() - iterator to walk over a list of &drm_gpuva
  * @va__: &drm_gpuva structure to assign to in each iteration step
@@ -837,6 +1023,17 @@  struct drm_gpuvm_ops {
 	 * used.
 	 */
 	int (*sm_step_unmap)(struct drm_gpuva_op *op, void *priv);
+
+	/**
+	 * @bo_validate: called from drm_gpuvm_validate()
+	 *
+	 * Drivers receive this callback for every evicted &drm_gem_object being
+	 * mapped in the corresponding &drm_gpuvm.
+	 *
+	 * Typically, drivers would call their driver specific variant of
+	 * ttm_bo_validate() from within this callback.
+	 */
+	int (*bo_validate)(struct drm_gem_object *obj);
 };
 
 int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,