diff mbox series

[RFC] drm/ttm: Forward -ENOSPC to drivers requesting it

Message ID 20240903133849.17119-1-thomas.hellstrom@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC] drm/ttm: Forward -ENOSPC to drivers requesting it | expand

Commit Message

Thomas Hellstrom Sept. 3, 2024, 1:38 p.m. UTC
Some user-space APIs distinguison between graphics memory OOMs and
system (host) memory OOMs. To aid UMDs in determining the type of
OOM, allow forwarding the ENOSPC from resource managers to drivers
on calls to ttm_bo_validate().

Cc: Christian König <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
 include/drm/ttm/ttm_bo.h     | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Christian König Sept. 3, 2024, 3:14 p.m. UTC | #1
Am 03.09.24 um 15:38 schrieb Thomas Hellström:
> Some user-space APIs distinguison between graphics memory OOMs and
> system (host) memory OOMs. To aid UMDs in determining the type of
> OOM, allow forwarding the ENOSPC from resource managers to drivers
> on calls to ttm_bo_validate().
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Ah yes that was on my TODO list as well.

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>   include/drm/ttm/ttm_bo.h     | 3 +++
>   2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index dd867b5e744c..d9a320dc8130 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -836,7 +836,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
>   	} while (ret && force_space);
>   
>   	/* For backward compatibility with userspace */
> -	if (ret == -ENOSPC)
> +	if (ret == -ENOSPC && !ctx->forward_enospc)

Mhm, couldn't we put that into the bdev? I would rather like to keep the 
UAPI consistent at least per driver.

Christian.

>   		return -ENOMEM;
>   
>   	/*
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 5804408815be..d3e12318d336 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -174,6 +174,8 @@ struct ttm_bo_kmap_obj {
>    * BOs share the same reservation object.
>    * @force_alloc: Don't check the memory account during suspend or CPU page
>    * faults. Should only be used by TTM internally.
> + * @forward_enospc: Don't translate -ENOSPC errors from resource managers to
> + * -ENOMEM, but forward them to the driver.
>    * @resv: Reservation object to allow reserved evictions with.
>    * @bytes_moved: Statistics on how many bytes have been moved.
>    *
> @@ -185,6 +187,7 @@ struct ttm_operation_ctx {
>   	bool no_wait_gpu;
>   	bool gfp_retry_mayfail;
>   	bool allow_res_evict;
> +	bool forward_enospc;
>   	bool force_alloc;
>   	struct dma_resv *resv;
>   	uint64_t bytes_moved;
Thomas Hellstrom Sept. 3, 2024, 3:22 p.m. UTC | #2
On Tue, 2024-09-03 at 17:14 +0200, Christian König wrote:
> Am 03.09.24 um 15:38 schrieb Thomas Hellström:
> > Some user-space APIs distinguison between graphics memory OOMs and
> > system (host) memory OOMs. To aid UMDs in determining the type of
> > OOM, allow forwarding the ENOSPC from resource managers to drivers
> > on calls to ttm_bo_validate().
> > 
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> Ah yes that was on my TODO list as well.
> 
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
> >   include/drm/ttm/ttm_bo.h     | 3 +++
> >   2 files changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > b/drivers/gpu/drm/ttm/ttm_bo.c
> > index dd867b5e744c..d9a320dc8130 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -836,7 +836,7 @@ int ttm_bo_validate(struct ttm_buffer_object
> > *bo,
> >   	} while (ret && force_space);
> >   
> >   	/* For backward compatibility with userspace */
> > -	if (ret == -ENOSPC)
> > +	if (ret == -ENOSPC && !ctx->forward_enospc)
> 
> Mhm, couldn't we put that into the bdev? I would rather like to keep
> the 
> UAPI consistent at least per driver.

Yes, we could probably do that, although that means changing
ttm_device_init() in all drivers.

And if doing that, then I think we should coalesce all bool arguments
to a flags argument to make the callers more readable.

What do you think?

/Thomas



> 
> Christian.
> 
> >   		return -ENOMEM;
> >   
> >   	/*
> > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > index 5804408815be..d3e12318d336 100644
> > --- a/include/drm/ttm/ttm_bo.h
> > +++ b/include/drm/ttm/ttm_bo.h
> > @@ -174,6 +174,8 @@ struct ttm_bo_kmap_obj {
> >    * BOs share the same reservation object.
> >    * @force_alloc: Don't check the memory account during suspend or
> > CPU page
> >    * faults. Should only be used by TTM internally.
> > + * @forward_enospc: Don't translate -ENOSPC errors from resource
> > managers to
> > + * -ENOMEM, but forward them to the driver.
> >    * @resv: Reservation object to allow reserved evictions with.
> >    * @bytes_moved: Statistics on how many bytes have been moved.
> >    *
> > @@ -185,6 +187,7 @@ struct ttm_operation_ctx {
> >   	bool no_wait_gpu;
> >   	bool gfp_retry_mayfail;
> >   	bool allow_res_evict;
> > +	bool forward_enospc;
> >   	bool force_alloc;
> >   	struct dma_resv *resv;
> >   	uint64_t bytes_moved;
>
Christian König Sept. 3, 2024, 3:24 p.m. UTC | #3
Am 03.09.24 um 17:22 schrieb Thomas Hellström:
> On Tue, 2024-09-03 at 17:14 +0200, Christian König wrote:
>> Am 03.09.24 um 15:38 schrieb Thomas Hellström:
>>> Some user-space APIs distinguison between graphics memory OOMs and
>>> system (host) memory OOMs. To aid UMDs in determining the type of
>>> OOM, allow forwarding the ENOSPC from resource managers to drivers
>>> on calls to ttm_bo_validate().
>>>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: dri-devel@lists.freedesktop.org
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Ah yes that was on my TODO list as well.
>>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>>>    include/drm/ttm/ttm_bo.h     | 3 +++
>>>    2 files changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index dd867b5e744c..d9a320dc8130 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -836,7 +836,7 @@ int ttm_bo_validate(struct ttm_buffer_object
>>> *bo,
>>>    	} while (ret && force_space);
>>>    
>>>    	/* For backward compatibility with userspace */
>>> -	if (ret == -ENOSPC)
>>> +	if (ret == -ENOSPC && !ctx->forward_enospc)
>> Mhm, couldn't we put that into the bdev? I would rather like to keep
>> the
>> UAPI consistent at least per driver.
> Yes, we could probably do that, although that means changing
> ttm_device_init() in all drivers.
>
> And if doing that, then I think we should coalesce all bool arguments
> to a flags argument to make the callers more readable.
>
> What do you think?

Works for me.

Christian.

>
> /Thomas
>
>
>
>> Christian.
>>
>>>    		return -ENOMEM;
>>>    
>>>    	/*
>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>> index 5804408815be..d3e12318d336 100644
>>> --- a/include/drm/ttm/ttm_bo.h
>>> +++ b/include/drm/ttm/ttm_bo.h
>>> @@ -174,6 +174,8 @@ struct ttm_bo_kmap_obj {
>>>     * BOs share the same reservation object.
>>>     * @force_alloc: Don't check the memory account during suspend or
>>> CPU page
>>>     * faults. Should only be used by TTM internally.
>>> + * @forward_enospc: Don't translate -ENOSPC errors from resource
>>> managers to
>>> + * -ENOMEM, but forward them to the driver.
>>>     * @resv: Reservation object to allow reserved evictions with.
>>>     * @bytes_moved: Statistics on how many bytes have been moved.
>>>     *
>>> @@ -185,6 +187,7 @@ struct ttm_operation_ctx {
>>>    	bool no_wait_gpu;
>>>    	bool gfp_retry_mayfail;
>>>    	bool allow_res_evict;
>>> +	bool forward_enospc;
>>>    	bool force_alloc;
>>>    	struct dma_resv *resv;
>>>    	uint64_t bytes_moved;
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index dd867b5e744c..d9a320dc8130 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -836,7 +836,7 @@  int ttm_bo_validate(struct ttm_buffer_object *bo,
 	} while (ret && force_space);
 
 	/* For backward compatibility with userspace */
-	if (ret == -ENOSPC)
+	if (ret == -ENOSPC && !ctx->forward_enospc)
 		return -ENOMEM;
 
 	/*
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 5804408815be..d3e12318d336 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -174,6 +174,8 @@  struct ttm_bo_kmap_obj {
  * BOs share the same reservation object.
  * @force_alloc: Don't check the memory account during suspend or CPU page
  * faults. Should only be used by TTM internally.
+ * @forward_enospc: Don't translate -ENOSPC errors from resource managers to
+ * -ENOMEM, but forward them to the driver.
  * @resv: Reservation object to allow reserved evictions with.
  * @bytes_moved: Statistics on how many bytes have been moved.
  *
@@ -185,6 +187,7 @@  struct ttm_operation_ctx {
 	bool no_wait_gpu;
 	bool gfp_retry_mayfail;
 	bool allow_res_evict;
+	bool forward_enospc;
 	bool force_alloc;
 	struct dma_resv *resv;
 	uint64_t bytes_moved;