[4/4] drm/xe/uapi: Support pinning of userptr vmas

Message ID	20230818150845.96679-5-thomas.hellstrom@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com> To: intel-xe@lists.freedesktop.org Subject: [PATCH 4/4] drm/xe/uapi: Support pinning of userptr vmas Date: Fri, 18 Aug 2023 17:08:45 +0200 Message-ID: <20230818150845.96679-5-thomas.hellstrom@linux.intel.com> In-Reply-To: <20230818150845.96679-1-thomas.hellstrom@linux.intel.com> References: <20230818150845.96679-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: list Cc: Matthew Brost <matthew.brost@intel.com>, =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>, dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/xe: Support optional pinning of userptr pages \| expand [0/4] drm/xe: Support optional pinning of userptr pages [1/4] drm/xe/vm: Use onion unwind for xe_vma_userptr_pin_pages() [2/4] drm/xe/vm: Implement userptr page pinning [3/4] drm/xe/vm: Perform accounting of userptr pinned pages [4/4] drm/xe/uapi: Support pinning of userptr vmas

Message ID

20230818150845.96679-5-thomas.hellstrom@linux.intel.com (mailing list archive)

State

New, archived

Headers

From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Subject: [PATCH 4/4] drm/xe/uapi: Support pinning of userptr vmas
Date: Fri, 18 Aug 2023 17:08:45 +0200
Message-ID: <20230818150845.96679-5-thomas.hellstrom@linux.intel.com>
In-Reply-To: <20230818150845.96679-1-thomas.hellstrom@linux.intel.com>
References: <20230818150845.96679-1-thomas.hellstrom@linux.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: list
Cc: Matthew Brost <matthew.brost@intel.com>,
 =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>,
 dri-devel@lists.freedesktop.org
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm/xe: Support optional pinning of userptr pages | expand

Commit Message

Thomas Hellstrom Aug. 18, 2023, 3:08 p.m. UTC

Support pinning of vmas using XE_VM_BIND_FLAG_PIN, initially for userptr
only. Pinned memory becomes accounted against RLIMIT_MEMLOCK and processes
with CAP_IPC_LOCK will not apply the limit. This is pretty similar to
mlock()'ing userptr memory with the added benefit that the driver is
aware and can ignore some actions in the MMU invalidation notifier.

This will initially become useful for compute VMs on hardware without
mid-thread-preemption capability since with pinned pages, the MMU
invalidation notifier never tries to preempt a running compute kernel.

If that were the only usage we could restrict this to a flag that always
pins userptr VMAs on compute VMs on such hardware, but there are
indications that this may become needed in other situations as well.

From a more general point of view, the usage pattern of a system may be
such that in most cases it only ever runs a single workload per system
and then the sysadmin would want to configure the system to allow
extensive pinning for performance reasons.

Hence we might want to extend the pinning capability to bo-backed VMAs
as well. How that pinning will be accounted remains an open but to build
on the current drm CGROUP work would be an option.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c       | 33 +++++++++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_vm_types.h |  2 ++
 include/uapi/drm/xe_drm.h        | 18 +++++++++++++++++
 3 files changed, 46 insertions(+), 7 deletions(-)

Comments

Matthew Brost Aug. 20, 2023, 3:54 a.m. UTC | #1

On Fri, Aug 18, 2023 at 05:08:45PM +0200, Thomas Hellström wrote:
> Support pinning of vmas using XE_VM_BIND_FLAG_PIN, initially for userptr
> only. Pinned memory becomes accounted against RLIMIT_MEMLOCK and processes
> with CAP_IPC_LOCK will not apply the limit. This is pretty similar to
> mlock()'ing userptr memory with the added benefit that the driver is
> aware and can ignore some actions in the MMU invalidation notifier.
> 
> This will initially become useful for compute VMs on hardware without
> mid-thread-preemption capability since with pinned pages, the MMU
> invalidation notifier never tries to preempt a running compute kernel.
> 
> If that were the only usage we could restrict this to a flag that always
> pins userptr VMAs on compute VMs on such hardware, but there are
> indications that this may become needed in other situations as well.
> 
> From a more general point of view, the usage pattern of a system may be
> such that in most cases it only ever runs a single workload per system
> and then the sysadmin would want to configure the system to allow
> extensive pinning for performance reasons.
> 
> Hence we might want to extend the pinning capability to bo-backed VMAs
> as well. How that pinning will be accounted remains an open but to build
> on the current drm CGROUP work would be an option.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Patch LGTM but a few comments that are currently out of scope but want
to get out there for future work.

> ---
>  drivers/gpu/drm/xe/xe_vm.c       | 33 +++++++++++++++++++++++++-------
>  drivers/gpu/drm/xe/xe_vm_types.h |  2 ++
>  include/uapi/drm/xe_drm.h        | 18 +++++++++++++++++
>  3 files changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index d9c000689002..3832f1f21def 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -936,6 +936,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  				    u64 start, u64 end,
>  				    bool read_only,
>  				    bool is_null,
> +				    bool pin,
>  				    u8 tile_mask)
>  {
>  	struct xe_vma *vma;
> @@ -967,6 +968,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  		vma->gpuva.flags |= XE_VMA_READ_ONLY;
>  	if (is_null)
>  		vma->gpuva.flags |= DRM_GPUVA_SPARSE;
> +	if (pin)
> +		vma->gpuva.flags |= XE_VMA_PINNED;
>  
>  	if (tile_mask) {
>  		vma->tile_mask = tile_mask;
> @@ -2367,6 +2370,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			op->map.read_only =
>  				operation & XE_VM_BIND_FLAG_READONLY;
>  			op->map.is_null = operation & XE_VM_BIND_FLAG_NULL;
> +			op->map.pin = operation & XE_VM_BIND_FLAG_PIN;
>  		}
>  		break;
>  	case XE_VM_BIND_OP_UNMAP:
> @@ -2431,7 +2435,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  }
>  
>  static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> -			      u8 tile_mask, bool read_only, bool is_null)
> +			      u8 tile_mask, bool read_only, bool is_null,
> +			      bool pin)
>  {
>  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>  	struct xe_vma *vma;
> @@ -2447,7 +2452,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  	}
>  	vma = xe_vma_create(vm, bo, op->gem.offset,
>  			    op->va.addr, op->va.addr +
> -			    op->va.range - 1, read_only, is_null,
> +			    op->va.range - 1, read_only, is_null, pin,
>  			    tile_mask);
>  	if (bo)
>  		xe_bo_unlock(bo, &ww);
> @@ -2562,7 +2567,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  
>  				vma = new_vma(vm, &op->base.map,
>  					      op->tile_mask, op->map.read_only,
> -					      op->map.is_null);
> +					      op->map.is_null, op->map.pin);
>  				if (IS_ERR(vma)) {
>  					err = PTR_ERR(vma);
>  					goto free_fence;
> @@ -2587,10 +2592,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  					bool is_null =
>  						op->base.remap.unmap->va->flags &
>  						DRM_GPUVA_SPARSE;
> +					bool pin =
> +						op->base.remap.unmap->va->flags &
> +						XE_VMA_PINNED;

We probably should move the read_only, is_null, and pin check out of the
next / prev if statements to just below the DRM_GPUVA_OP_REMAP case
statement. 

>  
>  					vma = new_vma(vm, op->base.remap.prev,
>  						      op->tile_mask, read_only,
> -						      is_null);
> +						      is_null, pin);
>  					if (IS_ERR(vma)) {
>  						err = PTR_ERR(vma);
>  						goto free_fence;
> @@ -2623,10 +2631,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  					bool is_null =
>  						op->base.remap.unmap->va->flags &
>  						DRM_GPUVA_SPARSE;
> +					bool pin =
> +						op->base.remap.unmap->va->flags &
> +						XE_VMA_PINNED;
>  
>  					vma = new_vma(vm, op->base.remap.next,
>  						      op->tile_mask, read_only,
> -						      is_null);
> +						      is_null, pin);
>  					if (IS_ERR(vma)) {
>  						err = PTR_ERR(vma);
>  						goto free_fence;
> @@ -3131,11 +3142,12 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
>  #define SUPPORTED_FLAGS	\
>  	(FORCE_ASYNC_OP_ERROR | XE_VM_BIND_FLAG_ASYNC | \
>  	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | \
> -	 XE_VM_BIND_FLAG_NULL | 0xffff)
> +	 XE_VM_BIND_FLAG_NULL | XE_VM_BIND_FLAG_PIN | 0xffff)
>  #else
>  #define SUPPORTED_FLAGS	\
>  	(XE_VM_BIND_FLAG_ASYNC | XE_VM_BIND_FLAG_READONLY | \
> -	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | 0xffff)
> +	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | \
> +	 XE_VM_BIND_FLAG_PIN | 0xffff)
>  #endif
>  #define XE_64K_PAGE_MASK 0xffffull
>  
> @@ -3205,6 +3217,13 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>  			goto free_bind_ops;
>  		}
>  
> +		/* TODO: Support OP_PREFETCH, OP_MAP */
> +		if (XE_IOCTL_DBG(xe, (op & XE_VM_BIND_FLAG_PIN) &&
> +				 VM_BIND_OP(op) != XE_VM_BIND_OP_MAP_USERPTR)) {
> +			err = -EINVAL;
> +			goto free_bind_ops;
> +		}
> +
>  		if (XE_IOCTL_DBG(xe, VM_BIND_OP(op) >
>  				 XE_VM_BIND_OP_PREFETCH) ||
>  		    XE_IOCTL_DBG(xe, op & ~SUPPORTED_FLAGS) ||
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index 9b90e649cd69..024ccabadd12 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -360,6 +360,8 @@ struct xe_vma_op_map {
>  	bool read_only;
>  	/** @is_null: is NULL binding */
>  	bool is_null;
> +	/** @pin: pin underlying memory */
> +	bool pin;
>  };
>  
>  /** struct xe_vma_op_remap - VMA remap operation */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 86f16d50e9cc..fc3d9cd4f8d0 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -631,6 +631,24 @@ struct drm_xe_vm_bind_op {
>  	 * intended to implement VK sparse bindings.
>  	 */
>  #define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
> +	 /*
> +	  * When the PIN flag is set, the user requests the underlying
> +	  * backing store of the vma to be pinned, that is, it will be
> +	  * resident while bound and the underlying physical memory
> +	  * will not change. For userptr VMAs this means that if the
> +	  * user performs an operation that changes the underlying
> +	  * pages of the CPU virtual space, the corresponding pinned
> +	  * GPU virtual space will not pick up the new memory unless
> +	  * an OP_UNMAP followed by a OP_MAP_USERPTR is performed.
> +	  * Pinned userptr memory is accounted in the same way as
> +	  * mlock(2), and if pinning fails the following error codes
> +	  * may be returned:
> +	  * -EINVAL: The memory region does not support pinning.
> +	  * -EPERM: The process is not permitted to pin.
> +	  * -ENOMEM: The pinning limit does not allow pinning.
> +	  * For userptr memory, CAP_IPC_LOCK will bypass the limit checking.
> +	  */
> +#define XE_VM_BIND_FLAG_PIN		(0x1 << 20)

We are quickly using a lot of the upper bits, maybe we change the op
field to a __u64 soon? We have to break the VM bind api when removing
the async worker + updating sync mode to align with VM bind doc, maybe
we change this then too?

Anyways this patch LGTM:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

>  	/** @op: Operation to perform (lower 16 bits) and flags (upper 16 bits) */
>  	__u32 op;
>  
> -- 
> 2.41.0
>

Thomas Hellstrom Aug. 22, 2023, 8:25 a.m. UTC | #2

On 8/20/23 05:54, Matthew Brost wrote:
> On Fri, Aug 18, 2023 at 05:08:45PM +0200, Thomas Hellström wrote:
>> Support pinning of vmas using XE_VM_BIND_FLAG_PIN, initially for userptr
>> only. Pinned memory becomes accounted against RLIMIT_MEMLOCK and processes
>> with CAP_IPC_LOCK will not apply the limit. This is pretty similar to
>> mlock()'ing userptr memory with the added benefit that the driver is
>> aware and can ignore some actions in the MMU invalidation notifier.
>>
>> This will initially become useful for compute VMs on hardware without
>> mid-thread-preemption capability since with pinned pages, the MMU
>> invalidation notifier never tries to preempt a running compute kernel.
>>
>> If that were the only usage we could restrict this to a flag that always
>> pins userptr VMAs on compute VMs on such hardware, but there are
>> indications that this may become needed in other situations as well.
>>
>>  From a more general point of view, the usage pattern of a system may be
>> such that in most cases it only ever runs a single workload per system
>> and then the sysadmin would want to configure the system to allow
>> extensive pinning for performance reasons.
>>
>> Hence we might want to extend the pinning capability to bo-backed VMAs
>> as well. How that pinning will be accounted remains an open but to build
>> on the current drm CGROUP work would be an option.
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Patch LGTM but a few comments that are currently out of scope but want
> to get out there for future work.
>
>> ---
>>   drivers/gpu/drm/xe/xe_vm.c       | 33 +++++++++++++++++++++++++-------
>>   drivers/gpu/drm/xe/xe_vm_types.h |  2 ++
>>   include/uapi/drm/xe_drm.h        | 18 +++++++++++++++++
>>   3 files changed, 46 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index d9c000689002..3832f1f21def 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -936,6 +936,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>   				    u64 start, u64 end,
>>   				    bool read_only,
>>   				    bool is_null,
>> +				    bool pin,
>>   				    u8 tile_mask)
>>   {
>>   	struct xe_vma *vma;
>> @@ -967,6 +968,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>   		vma->gpuva.flags |= XE_VMA_READ_ONLY;
>>   	if (is_null)
>>   		vma->gpuva.flags |= DRM_GPUVA_SPARSE;
>> +	if (pin)
>> +		vma->gpuva.flags |= XE_VMA_PINNED;
>>   
>>   	if (tile_mask) {
>>   		vma->tile_mask = tile_mask;
>> @@ -2367,6 +2370,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   			op->map.read_only =
>>   				operation & XE_VM_BIND_FLAG_READONLY;
>>   			op->map.is_null = operation & XE_VM_BIND_FLAG_NULL;
>> +			op->map.pin = operation & XE_VM_BIND_FLAG_PIN;
>>   		}
>>   		break;
>>   	case XE_VM_BIND_OP_UNMAP:
>> @@ -2431,7 +2435,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   }
>>   
>>   static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>> -			      u8 tile_mask, bool read_only, bool is_null)
>> +			      u8 tile_mask, bool read_only, bool is_null,
>> +			      bool pin)
>>   {
>>   	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>>   	struct xe_vma *vma;
>> @@ -2447,7 +2452,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>>   	}
>>   	vma = xe_vma_create(vm, bo, op->gem.offset,
>>   			    op->va.addr, op->va.addr +
>> -			    op->va.range - 1, read_only, is_null,
>> +			    op->va.range - 1, read_only, is_null, pin,
>>   			    tile_mask);
>>   	if (bo)
>>   		xe_bo_unlock(bo, &ww);
>> @@ -2562,7 +2567,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   
>>   				vma = new_vma(vm, &op->base.map,
>>   					      op->tile_mask, op->map.read_only,
>> -					      op->map.is_null);
>> +					      op->map.is_null, op->map.pin);
>>   				if (IS_ERR(vma)) {
>>   					err = PTR_ERR(vma);
>>   					goto free_fence;
>> @@ -2587,10 +2592,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   					bool is_null =
>>   						op->base.remap.unmap->va->flags &
>>   						DRM_GPUVA_SPARSE;
>> +					bool pin =
>> +						op->base.remap.unmap->va->flags &
>> +						XE_VMA_PINNED;
> We probably should move the read_only, is_null, and pin check out of the
> next / prev if statements to just below the DRM_GPUVA_OP_REMAP case
> statement.
>
>>   
>>   					vma = new_vma(vm, op->base.remap.prev,
>>   						      op->tile_mask, read_only,
>> -						      is_null);
>> +						      is_null, pin);
>>   					if (IS_ERR(vma)) {
>>   						err = PTR_ERR(vma);
>>   						goto free_fence;
>> @@ -2623,10 +2631,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   					bool is_null =
>>   						op->base.remap.unmap->va->flags &
>>   						DRM_GPUVA_SPARSE;
>> +					bool pin =
>> +						op->base.remap.unmap->va->flags &
>> +						XE_VMA_PINNED;
>>   
>>   					vma = new_vma(vm, op->base.remap.next,
>>   						      op->tile_mask, read_only,
>> -						      is_null);
>> +						      is_null, pin);
>>   					if (IS_ERR(vma)) {
>>   						err = PTR_ERR(vma);
>>   						goto free_fence;
>> @@ -3131,11 +3142,12 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
>>   #define SUPPORTED_FLAGS	\
>>   	(FORCE_ASYNC_OP_ERROR | XE_VM_BIND_FLAG_ASYNC | \
>>   	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | \
>> -	 XE_VM_BIND_FLAG_NULL | 0xffff)
>> +	 XE_VM_BIND_FLAG_NULL | XE_VM_BIND_FLAG_PIN | 0xffff)
>>   #else
>>   #define SUPPORTED_FLAGS	\
>>   	(XE_VM_BIND_FLAG_ASYNC | XE_VM_BIND_FLAG_READONLY | \
>> -	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | 0xffff)
>> +	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | \
>> +	 XE_VM_BIND_FLAG_PIN | 0xffff)
>>   #endif
>>   #define XE_64K_PAGE_MASK 0xffffull
>>   
>> @@ -3205,6 +3217,13 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>>   			goto free_bind_ops;
>>   		}
>>   
>> +		/* TODO: Support OP_PREFETCH, OP_MAP */
>> +		if (XE_IOCTL_DBG(xe, (op & XE_VM_BIND_FLAG_PIN) &&
>> +				 VM_BIND_OP(op) != XE_VM_BIND_OP_MAP_USERPTR)) {
>> +			err = -EINVAL;
>> +			goto free_bind_ops;
>> +		}
>> +
>>   		if (XE_IOCTL_DBG(xe, VM_BIND_OP(op) >
>>   				 XE_VM_BIND_OP_PREFETCH) ||
>>   		    XE_IOCTL_DBG(xe, op & ~SUPPORTED_FLAGS) ||
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index 9b90e649cd69..024ccabadd12 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -360,6 +360,8 @@ struct xe_vma_op_map {
>>   	bool read_only;
>>   	/** @is_null: is NULL binding */
>>   	bool is_null;
>> +	/** @pin: pin underlying memory */
>> +	bool pin;
>>   };
>>   
>>   /** struct xe_vma_op_remap - VMA remap operation */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 86f16d50e9cc..fc3d9cd4f8d0 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -631,6 +631,24 @@ struct drm_xe_vm_bind_op {
>>   	 * intended to implement VK sparse bindings.
>>   	 */
>>   #define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
>> +	 /*
>> +	  * When the PIN flag is set, the user requests the underlying
>> +	  * backing store of the vma to be pinned, that is, it will be
>> +	  * resident while bound and the underlying physical memory
>> +	  * will not change. For userptr VMAs this means that if the
>> +	  * user performs an operation that changes the underlying
>> +	  * pages of the CPU virtual space, the corresponding pinned
>> +	  * GPU virtual space will not pick up the new memory unless
>> +	  * an OP_UNMAP followed by a OP_MAP_USERPTR is performed.
>> +	  * Pinned userptr memory is accounted in the same way as
>> +	  * mlock(2), and if pinning fails the following error codes
>> +	  * may be returned:
>> +	  * -EINVAL: The memory region does not support pinning.
>> +	  * -EPERM: The process is not permitted to pin.
>> +	  * -ENOMEM: The pinning limit does not allow pinning.
>> +	  * For userptr memory, CAP_IPC_LOCK will bypass the limit checking.
>> +	  */
>> +#define XE_VM_BIND_FLAG_PIN		(0x1 << 20)
> We are quickly using a lot of the upper bits, maybe we change the op
> field to a __u64 soon? We have to break the VM bind api when removing
> the async worker + updating sync mode to align with VM bind doc, maybe
> we change this then too?

What about a separate flags field?

/Thomas


>
> Anyways this patch LGTM:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>
>>   	/** @op: Operation to perform (lower 16 bits) and flags (upper 16 bits) */
>>   	__u32 op;
>>   
>> -- 
>> 2.41.0
>>

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index d9c000689002..3832f1f21def 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -936,6 +936,7 @@  static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 				    u64 start, u64 end,
 				    bool read_only,
 				    bool is_null,
+				    bool pin,
 				    u8 tile_mask)
 {
 	struct xe_vma *vma;
@@ -967,6 +968,8 @@  static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 		vma->gpuva.flags |= XE_VMA_READ_ONLY;
 	if (is_null)
 		vma->gpuva.flags |= DRM_GPUVA_SPARSE;
+	if (pin)
+		vma->gpuva.flags |= XE_VMA_PINNED;
 
 	if (tile_mask) {
 		vma->tile_mask = tile_mask;
@@ -2367,6 +2370,7 @@  vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			op->map.read_only =
 				operation & XE_VM_BIND_FLAG_READONLY;
 			op->map.is_null = operation & XE_VM_BIND_FLAG_NULL;
+			op->map.pin = operation & XE_VM_BIND_FLAG_PIN;
 		}
 		break;
 	case XE_VM_BIND_OP_UNMAP:
@@ -2431,7 +2435,8 @@  vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 }
 
 static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
-			      u8 tile_mask, bool read_only, bool is_null)
+			      u8 tile_mask, bool read_only, bool is_null,
+			      bool pin)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
 	struct xe_vma *vma;
@@ -2447,7 +2452,7 @@  static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	}
 	vma = xe_vma_create(vm, bo, op->gem.offset,
 			    op->va.addr, op->va.addr +
-			    op->va.range - 1, read_only, is_null,
+			    op->va.range - 1, read_only, is_null, pin,
 			    tile_mask);
 	if (bo)
 		xe_bo_unlock(bo, &ww);
@@ -2562,7 +2567,7 @@  static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 				vma = new_vma(vm, &op->base.map,
 					      op->tile_mask, op->map.read_only,
-					      op->map.is_null);
+					      op->map.is_null, op->map.pin);
 				if (IS_ERR(vma)) {
 					err = PTR_ERR(vma);
 					goto free_fence;
@@ -2587,10 +2592,13 @@  static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 					bool is_null =
 						op->base.remap.unmap->va->flags &
 						DRM_GPUVA_SPARSE;
+					bool pin =
+						op->base.remap.unmap->va->flags &
+						XE_VMA_PINNED;
 
 					vma = new_vma(vm, op->base.remap.prev,
 						      op->tile_mask, read_only,
-						      is_null);
+						      is_null, pin);
 					if (IS_ERR(vma)) {
 						err = PTR_ERR(vma);
 						goto free_fence;
@@ -2623,10 +2631,13 @@  static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 					bool is_null =
 						op->base.remap.unmap->va->flags &
 						DRM_GPUVA_SPARSE;
+					bool pin =
+						op->base.remap.unmap->va->flags &
+						XE_VMA_PINNED;
 
 					vma = new_vma(vm, op->base.remap.next,
 						      op->tile_mask, read_only,
-						      is_null);
+						      is_null, pin);
 					if (IS_ERR(vma)) {
 						err = PTR_ERR(vma);
 						goto free_fence;
@@ -3131,11 +3142,12 @@  static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
 #define SUPPORTED_FLAGS	\
 	(FORCE_ASYNC_OP_ERROR | XE_VM_BIND_FLAG_ASYNC | \
 	 XE_VM_BIND_FLAG_READONLY | XE_VM_BIND_FLAG_IMMEDIATE | \
-	 XE_VM_BIND_FLAG_NULL | 0xffff)
+	 XE_VM_BIND_FLAG_NULL | XE_VM_BIND_FLAG_PIN | 0xffff)
 #else
 #define SUPPORTED_FLAGS	\
 	(XE_VM_BIND_FLAG_ASYNC | XE_VM_BIND_FLAG_READONLY | \
-	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | 0xffff)
+	 XE_VM_BIND_FLAG_IMMEDIATE | XE_VM_BIND_FLAG_NULL | \
+	 XE_VM_BIND_FLAG_PIN | 0xffff)
 #endif
 #define XE_64K_PAGE_MASK 0xffffull
 
@@ -3205,6 +3217,13 @@  static int vm_bind_ioctl_check_args(struct xe_device *xe,
 			goto free_bind_ops;
 		}
 
+		/* TODO: Support OP_PREFETCH, OP_MAP */
+		if (XE_IOCTL_DBG(xe, (op & XE_VM_BIND_FLAG_PIN) &&
+				 VM_BIND_OP(op) != XE_VM_BIND_OP_MAP_USERPTR)) {
+			err = -EINVAL;
+			goto free_bind_ops;
+		}
+
 		if (XE_IOCTL_DBG(xe, VM_BIND_OP(op) >
 				 XE_VM_BIND_OP_PREFETCH) ||
 		    XE_IOCTL_DBG(xe, op & ~SUPPORTED_FLAGS) ||
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 9b90e649cd69..024ccabadd12 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -360,6 +360,8 @@  struct xe_vma_op_map {
 	bool read_only;
 	/** @is_null: is NULL binding */
 	bool is_null;
+	/** @pin: pin underlying memory */
+	bool pin;
 };
 
 /** struct xe_vma_op_remap - VMA remap operation */
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 86f16d50e9cc..fc3d9cd4f8d0 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -631,6 +631,24 @@  struct drm_xe_vm_bind_op {
 	 * intended to implement VK sparse bindings.
 	 */
 #define XE_VM_BIND_FLAG_NULL		(0x1 << 19)
+	 /*
+	  * When the PIN flag is set, the user requests the underlying
+	  * backing store of the vma to be pinned, that is, it will be
+	  * resident while bound and the underlying physical memory
+	  * will not change. For userptr VMAs this means that if the
+	  * user performs an operation that changes the underlying
+	  * pages of the CPU virtual space, the corresponding pinned
+	  * GPU virtual space will not pick up the new memory unless
+	  * an OP_UNMAP followed by a OP_MAP_USERPTR is performed.
+	  * Pinned userptr memory is accounted in the same way as
+	  * mlock(2), and if pinning fails the following error codes
+	  * may be returned:
+	  * -EINVAL: The memory region does not support pinning.
+	  * -EPERM: The process is not permitted to pin.
+	  * -ENOMEM: The pinning limit does not allow pinning.
+	  * For userptr memory, CAP_IPC_LOCK will bypass the limit checking.
+	  */
+#define XE_VM_BIND_FLAG_PIN		(0x1 << 20)
 	/** @op: Operation to perform (lower 16 bits) and flags (upper 16 bits) */
 	__u32 op;