diff mbox series

[8/8] drm/i915: Allow user to set cache at BO creation

Message ID 20230419230058.2659455-9-fei.yang@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/mtl: Define MOCS and PAT tables for MTL | expand

Commit Message

Yang, Fei April 19, 2023, 11 p.m. UTC
From: Fei Yang <fei.yang@intel.com>

To comply with the design that buffer objects shall have immutable
cache setting through out their life cycle, {set, get}_caching ioctl's
are no longer supported from MTL onward. With that change caching
policy can only be set at object creation time. The current code
applies a default (platform dependent) cache setting for all objects.
However this is not optimal for performance tuning. The patch extends
the existing gem_create uAPI to let user set PAT index for the object
at creation time.
The new extension is platform independent, so UMD's can switch to using
this extension for older platforms as well, while {set, get}_caching are
still supported on these legacy paltforms for compatibility reason.

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_object.c |  6 ++++
 include/uapi/drm/i915_drm.h                | 36 ++++++++++++++++++++++
 tools/include/uapi/drm/i915_drm.h          | 36 ++++++++++++++++++++++
 4 files changed, 114 insertions(+)

Comments

Andi Shyti April 20, 2023, 11:39 a.m. UTC | #1
Hi Fei,

> To comply with the design that buffer objects shall have immutable
> cache setting through out their life cycle, {set, get}_caching ioctl's
> are no longer supported from MTL onward. With that change caching
> policy can only be set at object creation time. The current code
> applies a default (platform dependent) cache setting for all objects.
> However this is not optimal for performance tuning. The patch extends
> the existing gem_create uAPI to let user set PAT index for the object
> at creation time.
> The new extension is platform independent, so UMD's can switch to using
> this extension for older platforms as well, while {set, get}_caching are
> still supported on these legacy paltforms for compatibility reason.
> 
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

because this is an API change, we need some more information
here.

First of all you need to CC the userspace guys that have been
working on top of your series and get their ack's.

I also believe that this series has also been tested on a
separate repository, would you link it in the commit message?

Thanks,
Andi
Tvrtko Ursulin April 20, 2023, 1:06 p.m. UTC | #2
On 20/04/2023 12:39, Andi Shyti wrote:
> Hi Fei,
> 
>> To comply with the design that buffer objects shall have immutable
>> cache setting through out their life cycle, {set, get}_caching ioctl's
>> are no longer supported from MTL onward. With that change caching
>> policy can only be set at object creation time. The current code
>> applies a default (platform dependent) cache setting for all objects.
>> However this is not optimal for performance tuning. The patch extends
>> the existing gem_create uAPI to let user set PAT index for the object
>> at creation time.
>> The new extension is platform independent, so UMD's can switch to using
>> this extension for older platforms as well, while {set, get}_caching are
>> still supported on these legacy paltforms for compatibility reason.
>>
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> 
> because this is an API change, we need some more information
> here.
> 
> First of all you need to CC the userspace guys that have been
> working on top of your series and get their ack's.

Yes, and a link to a Mesa merge request which uses the uapi should be 
included.

IGTs should be ready to before we can merge. I glanced over igt-dev but 
did not spot anything.

Regards,

Tvrtko

> 
> I also believe that this series has also been tested on a
> separate repository, would you link it in the commit message?
> 
> Thanks,
> Andi
Yang, Fei April 20, 2023, 4:11 p.m. UTC | #3
> On 20/04/2023 12:39, Andi Shyti wrote:
>> Hi Fei,
>>
>>> To comply with the design that buffer objects shall have immutable
>>> cache setting through out their life cycle, {set, get}_caching ioctl's
>>> are no longer supported from MTL onward. With that change caching
>>> policy can only be set at object creation time. The current code
>>> applies a default (platform dependent) cache setting for all objects.
>>> However this is not optimal for performance tuning. The patch extends
>>> the existing gem_create uAPI to let user set PAT index for the object
>>> at creation time.
>>> The new extension is platform independent, so UMD's can switch to using
>>> this extension for older platforms as well, while {set, get}_caching are
>>> still supported on these legacy paltforms for compatibility reason.
>>>
>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>
>> because this is an API change, we need some more information
>> here.
>>
>> First of all you need to CC the userspace guys that have been
>> working on top of your series and get their ack's.
>
> Yes, and a link to a Mesa merge request which uses the uapi should be
> included.

Working with Mesa team on this, stay tuned.

> IGTs should be ready to before we can merge. I glanced over igt-dev but
> did not spot anything.

There is a IGT patch posted to gfx-internal-devel, titled "test/gem_create: exercise gem_create_ext_set_pat"

> Regards,
>
> Tvrtko
>
>>
>> I also believe that this series has also been tested on a
>> separate repository, would you link it in the commit message?
>>
>> Thanks,
>> Andi
Andi Shyti April 20, 2023, 4:29 p.m. UTC | #4
Hi Fei,

> >>> To comply with the design that buffer objects shall have immutable
> >>> cache setting through out their life cycle, {set, get}_caching ioctl's
> >>> are no longer supported from MTL onward. With that change caching
> >>> policy can only be set at object creation time. The current code
> >>> applies a default (platform dependent) cache setting for all objects.
> >>> However this is not optimal for performance tuning. The patch extends
> >>> the existing gem_create uAPI to let user set PAT index for the object
> >>> at creation time.
> >>> The new extension is platform independent, so UMD's can switch to using
> >>> this extension for older platforms as well, while {set, get}_caching are
> >>> still supported on these legacy paltforms for compatibility reason.
> >>>
> >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>> Cc: Matt Roper <matthew.d.roper@intel.com>
> >>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> >>> Signed-off-by: Fei Yang <fei.yang@intel.com>
> >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> >>
> >> because this is an API change, we need some more information
> >> here.
> >>
> >> First of all you need to CC the userspace guys that have been
> >> working on top of your series and get their ack's.
> >
> > Yes, and a link to a Mesa merge request which uses the uapi should be
> > included.
> 
> Working with Mesa team on this, stay tuned.
> 
> > IGTs should be ready to before we can merge. I glanced over igt-dev but
> > did not spot anything.
> 
> There is a IGT patch posted to gfx-internal-devel, titled "test/gem_create:
> exercise gem_create_ext_set_pat"

Any chance to have it ported upstream? It's OK even if it's not
merged (at least on my side) but some public interface testing is
needed.

If you do post it upstream you could add in the cover letter:

Test-with: <mail-id>

where the mail-id is referred to the upstream patch of the test.

Andi
Jordan Justen April 21, 2023, 8:48 p.m. UTC | #5
On 2023-04-20 09:11:18, Yang, Fei wrote:
> > On 20/04/2023 12:39, Andi Shyti wrote:
> >> Hi Fei,
> >>
> >>> To comply with the design that buffer objects shall have immutable
> >>> cache setting through out their life cycle, {set, get}_caching ioctl's
> >>> are no longer supported from MTL onward. With that change caching
> >>> policy can only be set at object creation time. The current code
> >>> applies a default (platform dependent) cache setting for all objects.
> >>> However this is not optimal for performance tuning. The patch extends
> >>> the existing gem_create uAPI to let user set PAT index for the object
> >>> at creation time.
> >>> The new extension is platform independent, so UMD's can switch to using
> >>> this extension for older platforms as well, while {set, get}_caching are
> >>> still supported on these legacy paltforms for compatibility reason.
> >>>
> >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>> Cc: Matt Roper <matthew.d.roper@intel.com>
> >>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> >>> Signed-off-by: Fei Yang <fei.yang@intel.com>
> >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> >>
> >> because this is an API change, we need some more information
> >> here.
> >>
> >> First of all you need to CC the userspace guys that have been
> >> working on top of your series and get their ack's.
> >
> > Yes, and a link to a Mesa merge request which uses the uapi should be
> > included.
> 
> Working with Mesa team on this, stay tuned.
> 

I would like to see the extension detection issue is handled before
ack'ing this.

How about a new DRM_I915_QUERY_GEM_CREATE_EXTENSIONS item, that
returns a u64 array of usable extension names for
DRM_IOCTL_I915_GEM_CREATE_EXT?

A similar DRM_I915_QUERY_GEM_CONTEXT_CREATE_EXTENSIONS could also
provide an alternative to Alan's "drm/i915/uapi/pxp: Add a GET_PARAM
for PXP", and more easily allow advertising future new extensions for
context/buffer creation.

-Jordan
Tvrtko Ursulin April 24, 2023, 9:08 a.m. UTC | #6
[fixed mailing lists addresses]

On 24/04/2023 09:36, Jordan Justen wrote:
> On 2023-04-23 00:05:06, Yang, Fei wrote:
>>> On 2023-04-20 09:11:18, Yang, Fei wrote:
>>>>> On 20/04/2023 12:39, Andi Shyti wrote:
>>>>>> Hi Fei,
>>>>>>
>>>>>> because this is an API change, we need some more information here.
>>>>>>
>>>>>> First of all you need to CC the userspace guys that have been
>>>>>> working on top of your series and get their ack's.
>>>>>
>>>>> Yes, and a link to a Mesa merge request which uses the uapi should
>>>>> be included.
>>>>
>>>> Working with Mesa team on this, stay tuned.
>>>>
>>>
>>> I would like to see the extension detection issue is handled
>>> before ack'ing this.
>>>
>>> How about a new DRM_I915_QUERY_GEM_CREATE_EXTENSIONS item, that returns
>>> a u64 array of usable extension names for DRM_IOCTL_I915_GEM_CREATE_EXT?
>>
>> I agree a query mechanism is necessary, but that should be generic for all
>> uAPI's, not just for GEM_CREATE.
>>
>>> A similar DRM_I915_QUERY_GEM_CONTEXT_CREATE_EXTENSIONS could also provide
>>> an alternative to Alan's "drm/i915/uapi/pxp: Add a GET_PARAM for PXP",
>>> and more easily allow advertising future new extensions for context/buffer
>>> creation.
>>
>> I think we should have a discussion and come up with a sustainable design for
>> the query uAPI, rather than just put in a quick hack for this.
> 
> I think you are being a bit too quick to dismiss my idea as a quick
> hack... Nevetheless, I would love to hear an alternate suggestion.
> Just as long as it's not, "let's figure this out later, because I need
> to add this feature now".
> 
> I don't think "just try to use it and if it fails, I guess it isn't
> supported" is reasonable. So, if we can't do better, at least add a
> GET_PARAM. Yeah, it's a quick hack, but it's better than nothing.

Being able to "list" supported extensions sounds like a reasonable principle, albeit a departure from the design direction to date. Which means there are probably no quick solutions. Also, AFAIU, only PXP context create is the problematic one, right? Everything else is pretty much instant or delayed allocation so super cheap to probe by attempting to use.

If I got that right and given this series is about drm_i915_gem_create_ext I don't think this side discussion should be blocking it.

Furthermore the PXP context create story is even more complicated, in a way that it is not just about querying whether the extension is supported, but the expensive check is something more complicated.

Going back to implementation details for this proposed new feature, one alternative to query could be something like:

   drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;

That would be somewhat more light weight to implement that the i915_query route. And it appears it would work for all ioctls which support extensions apart for i915_context_param_engines.

Regards,

Tvrtko
Jordan Justen April 24, 2023, 5:13 p.m. UTC | #7
On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> 
> Being able to "list" supported extensions sounds like a reasonable
> principle, albeit a departure from the design direction to date.
> Which means there are probably no quick solutions. Also, AFAIU, only
> PXP context create is the problematic one, right? Everything else is
> pretty much instant or delayed allocation so super cheap to probe by
> attempting to use.
> 
> If I got that right and given this series is about
> drm_i915_gem_create_ext I don't think this side discussion should be
> blocking it.

This still leaves the issue of no reasonable detection mechanism for
the extension. If the discussion gets too complicated, then can we add
a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
up with something better reasonably quickly, or i915/Xe can add a new
param for each new extensions until a better approach is available.

> Furthermore the PXP context create story is even more complicated,
> in a way that it is not just about querying whether the extension is
> supported, but the expensive check is something more complicated.
> 
> Going back to implementation details for this proposed new feature,
> one alternative to query could be something like:
> 
>    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> 
> That would be somewhat more light weight to implement that the
> i915_query route. And it appears it would work for all ioctls which
> support extensions apart for i915_context_param_engines.

This seems little better than the "try it, and if it works then it's
supported".

I'm not suggesting that userspace should be able to check that
scenario x+y+z will work, but more a list of extensions that
conceivably could work. Normally this should just a matter of the
kernel unconditionally adding the newly implemented extension to the
list returned in the query call.

If a GET_PARAM can be made for the PXP case, then it seems like a
query item returning CONTEXT_CREATE extensions could conditionally
omit that extension just as easily as implementing the proposed new
GET_PARAM.

-Jordan
Joonas Lahtinen April 25, 2023, 1:41 p.m. UTC | #8
(+ Faith and Daniel as they have been involved in previous discussions)

Quoting Jordan Justen (2023-04-24 20:13:00)
> On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > 
> > Being able to "list" supported extensions sounds like a reasonable
> > principle, albeit a departure from the design direction to date.
> > Which means there are probably no quick solutions. Also, AFAIU, only
> > PXP context create is the problematic one, right? Everything else is
> > pretty much instant or delayed allocation so super cheap to probe by
> > attempting to use.
> > 
> > If I got that right and given this series is about
> > drm_i915_gem_create_ext I don't think this side discussion should be
> > blocking it.
> 
> This still leaves the issue of no reasonable detection mechanism for
> the extension.

I remember this exact discussion being repeated at least a few times in
the past 5 years. Previously the conclusion was always that detection by
trying to use the extension leads to least amount of uAPI surface. There
is also no concern of having mismatch in backporting of the query and the
functionality patches.

Why exactly do you think it is more reasonable to do the following?

check_if_extension_query_is_supported();
<check retval>
check_if_extension_xyz_is_supported();
<check retval>

VS

create_[context,buffer,whatever]_with_extension();
<check errno>
destroy_[context,buffer,whatever]();

For contexts and buffers there's no overhead, and we're keeping the uAPI
surface smaller (less bugs, less validation effort). Additionally we
support checking combinations of extensions A, B and C without extra
effort.

If we're not returning enough clear errnos, then that is something to
make sure we do.

Regards, Joonas

> If the discussion gets too complicated, then can we add
> a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
> up with something better reasonably quickly, or i915/Xe can add a new
> param for each new extensions until a better approach is available.
> 
> > Furthermore the PXP context create story is even more complicated,
> > in a way that it is not just about querying whether the extension is
> > supported, but the expensive check is something more complicated.
> > 
> > Going back to implementation details for this proposed new feature,
> > one alternative to query could be something like:
> > 
> >    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> > 
> > That would be somewhat more light weight to implement that the
> > i915_query route. And it appears it would work for all ioctls which
> > support extensions apart for i915_context_param_engines.
> 
> This seems little better than the "try it, and if it works then it's
> supported".
> 
> I'm not suggesting that userspace should be able to check that
> scenario x+y+z will work, but more a list of extensions that
> conceivably could work. Normally this should just a matter of the
> kernel unconditionally adding the newly implemented extension to the
> list returned in the query call.
> 
> If a GET_PARAM can be made for the PXP case, then it seems like a
> query item returning CONTEXT_CREATE extensions could conditionally
> omit that extension just as easily as implementing the proposed new
> GET_PARAM.
> 
> -Jordan
Alan Previn April 25, 2023, 5:21 p.m. UTC | #9
On Tue, 2023-04-25 at 16:41 +0300, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
An orthogonal (but losely related) question: Is PXP the only subsystem that has
the unique problem of: Uses a delayed worker to complete all dependencies for init..
but they take so long that by the time compositors-mesa-init comes up, creation
of PXP context blocks too long and may still likely failing and incorrectly
assuming PXP is not supported. (when we don't have a GET_PARAM for it).
I believe HuC has a similiar issue - but doesnt reflect in the UAPI but rather the cmd buffers.
We don't have other subsystems that have this problem? (dependency on other startups?)
Jordan Justen April 25, 2023, 6:19 p.m. UTC | #10
On 2023-04-25 06:41:54, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
> 
> Quoting Jordan Justen (2023-04-24 20:13:00)
> > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > 
> > > Being able to "list" supported extensions sounds like a reasonable
> > > principle, albeit a departure from the design direction to date.
> > > Which means there are probably no quick solutions. Also, AFAIU, only
> > > PXP context create is the problematic one, right? Everything else is
> > > pretty much instant or delayed allocation so super cheap to probe by
> > > attempting to use.
> > > 
> > > If I got that right and given this series is about
> > > drm_i915_gem_create_ext I don't think this side discussion should be
> > > blocking it.
> > 
> > This still leaves the issue of no reasonable detection mechanism for
> > the extension.
> 
> I remember this exact discussion being repeated at least a few times in
> the past 5 years. Previously the conclusion was always that detection by
> trying to use the extension leads to least amount of uAPI surface. There
> is also no concern of having mismatch in backporting of the query and the
> functionality patches.
> 
> Why exactly do you think it is more reasonable to do the following?
> 
> check_if_extension_query_is_supported();
> <check retval>
> check_if_extension_xyz_is_supported();
> <check retval>

As I've mentioned a couple times, I'm asking for query item that
returns it all the extensions that conceivably could work.

In theory it could be made a single query item which somehow then
enumerates if something is a context extension or bo extension. But,
it seems simpler for all if we just have a couple query items
returning a simple u64 array for the few classes of extensions.

> VS
> 
> create_[context,buffer,whatever]_with_extension();
> <check errno>
> destroy_[context,buffer,whatever]();
> 
> For contexts and buffers there's no overhead,

There's no-overhead to creating and destroying things? I think the 8s
timeout when trying create a protected content context shows it's not
always quite that simple.

Over time userspace will have to continue growing this set of
create/destroy tests as new extensions are added. Simply so we can
discover what extensions are supported.

This doesn't seem like a very robust way to advertise extensions for
an api.

Another point ... say you need to debug why some simple app is failing
to start the driver. One tool might be strace. But now you have a
bunch of noise of calls from the driver creating and destroying things
just to discover what extensions the kernel supports. It would be nice
if all this was handled with a query item vs a bunch of
create/destroys.

> and we're keeping the uAPI surface smaller (less bugs, less
> validation effort).

I understand wanting to keep the kernel uapi and implementation
simple.

Is it too complicated to return a u64 array for a query item
indicating the extensions supported for the various classes of
extensions? I think in most cases it'll just be essentially shuffling
across an array of u64 items. (Since most known extensions will always
be supported by the kernel.)

> Additionally we support checking combinations of extensions A, B and
> C without extra effort.

Regarding combinations of extensions, is that really something
userspace needs to probe? I would think if userspace knows about the
possible extensions, then it's on userspace to use combinations
appropriately.

But, in detecting extensions, it is possible that, say extension B
relies on extension A. Now userspace may have to probe to see if
extension A is supported, and then probe for extension B while using
extension A. Essentially, probing for an extension could become a bit
complicated. Vs the kernel just giving us a u64 array of known
extensions...

-Jordan
Daniel Vetter April 26, 2023, 11:52 a.m. UTC | #11
On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
> 
> Quoting Jordan Justen (2023-04-24 20:13:00)
> > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > 
> > > Being able to "list" supported extensions sounds like a reasonable
> > > principle, albeit a departure from the design direction to date.
> > > Which means there are probably no quick solutions. Also, AFAIU, only
> > > PXP context create is the problematic one, right? Everything else is
> > > pretty much instant or delayed allocation so super cheap to probe by
> > > attempting to use.
> > > 
> > > If I got that right and given this series is about
> > > drm_i915_gem_create_ext I don't think this side discussion should be
> > > blocking it.
> > 
> > This still leaves the issue of no reasonable detection mechanism for
> > the extension.
> 
> I remember this exact discussion being repeated at least a few times in
> the past 5 years. Previously the conclusion was always that detection by
> trying to use the extension leads to least amount of uAPI surface. There
> is also no concern of having mismatch in backporting of the query and the
> functionality patches.
> 
> Why exactly do you think it is more reasonable to do the following?
> 
> check_if_extension_query_is_supported();
> <check retval>
> check_if_extension_xyz_is_supported();
> <check retval>
> 
> VS
> 
> create_[context,buffer,whatever]_with_extension();
> <check errno>
> destroy_[context,buffer,whatever]();
> 
> For contexts and buffers there's no overhead, and we're keeping the uAPI
> surface smaller (less bugs, less validation effort). Additionally we
> support checking combinations of extensions A, B and C without extra
> effort.
> 
> If we're not returning enough clear errnos, then that is something to
> make sure we do.

Joonas asked me to put my thoughts here:

- in general the "feature discovery by trying it out" approach is most
  robust and hence preferred, but it's also not something that's required
  when there's good reasons against it

- the more a feature spans drivers/modules, the more it should be
  discovered by trying it out, e.g. dma-buf fence import/export was a huge
  discussion, luckily mesa devs figured out how to transparantly fall back
  at runtime so we didn't end up merging the separate feature flag (I
  think at least, can't find it). pxp being split across i915/me/fw/who
  knows what else is kinda similar so I'd heavily lean towards discovery
  by creating a context

- pxp taking 8s to init a ctx sounds very broken, irrespective of anything
  else

- in practice there's not really a combinatorial explosion, for a lot of
  reasons:
  - kernel and userspace tend to assume/require implied features where it
    makes sense, e.g. in kms atomic implies planes and universal planes.
    mesa has been doing the same
  - you cold go all the way to what radeon/amdgpu have done for years with
    a single incrementing version. Probably not flexible enough for intel.
  - every pciid is it's own uapi, so a feature only needs to be tested on
    platforms where it didn't ship from the start. Also cuts down on
    runtime discovery a lot
  - mesa tends to only support down to current lts kernels and not older,
    which again cuts the combinations a lot

- I did look through upstream kernel docs for general (driver) uapi
  recommendations, but there isn't really anything about good taste stuff,
  just a lot about not screwing up compatibility across kernels/platforms.

tldr; prefer discovery, don't sweat it if not, definitely don't
overengineer this with some magic "give me all extensions" thing because
that results in guaranteed cross-component backporting pains sooner or
later. Or inconsistency, which defeats the point.

Cheers, Daniel
 
> Regards, Joonas
> 
> > If the discussion gets too complicated, then can we add
> > a GET_PARAM for the SET_PAT extension? I'm hoping we could either come
> > up with something better reasonably quickly, or i915/Xe can add a new
> > param for each new extensions until a better approach is available.
> > 
> > > Furthermore the PXP context create story is even more complicated,
> > > in a way that it is not just about querying whether the extension is
> > > supported, but the expensive check is something more complicated.
> > > 
> > > Going back to implementation details for this proposed new feature,
> > > one alternative to query could be something like:
> > > 
> > >    drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS;
> > > 
> > > That would be somewhat more light weight to implement that the
> > > i915_query route. And it appears it would work for all ioctls which
> > > support extensions apart for i915_context_param_engines.
> > 
> > This seems little better than the "try it, and if it works then it's
> > supported".
> > 
> > I'm not suggesting that userspace should be able to check that
> > scenario x+y+z will work, but more a list of extensions that
> > conceivably could work. Normally this should just a matter of the
> > kernel unconditionally adding the newly implemented extension to the
> > list returned in the query call.
> > 
> > If a GET_PARAM can be made for the PXP case, then it seems like a
> > query item returning CONTEXT_CREATE extensions could conditionally
> > omit that extension just as easily as implementing the proposed new
> > GET_PARAM.
> > 
> > -Jordan
Alan Previn April 26, 2023, 4:48 p.m. UTC | #12
On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote:
> On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
> > (+ Faith and Daniel as they have been involved in previous discussions)
> > Quoting Jordan Justen (2023-04-24 20:13:00)
> > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
> > > > 
> > > > 
> 
alan:snip

> - the more a feature spans drivers/modules, the more it should be
>   discovered by trying it out, e.g. dma-buf fence import/export was a huge
>   discussion, luckily mesa devs figured out how to transparantly fall back
>   at runtime so we didn't end up merging the separate feature flag (I
>   think at least, can't find it). pxp being split across i915/me/fw/who
>   knows what else is kinda similar so I'd heavily lean towards discovery
>   by creating a context
> 
> - pxp taking 8s to init a ctx sounds very broken, irrespective of anything
>   else
> 

Alan: Please be aware that:
1. the wait-timeout was changed to 1 second sometime back.
2. the I'm not deciding the time-out. I initially wanted to keep it at the same
timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per
same ADL behavior). Daniele requested to move it to 8 seconds - but thru review
process, we reduced it to 1 second.
3. In anycase, thats just the wait-timeout - and we know it wont succeed until
~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915
- its the component driver load <-- this is what's broken. 

Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy
component driver load + bind, huc-authentication and gsc-proxy-init-handshake.
Most of above steps begin rather quickly during i915 driver load - the delay
seems to come from a very late mei-gsc-proxy component driver load. In fact the
parent mei-me driver is only getting ~6 seconds after i915 init is done. That
blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP.

That said, what is broken is why it takes so long to get the component drivers
to come up. NOTE: PXP isnt really doing anything differently in the context
creation flow (in terms of time-consuming-steps compared to ADL) besides the
extra dependency waits these.

We can actually go back to the original timeout of 250 milisecs like we have in ADL
but will fail if MESA calls in too early (but will succeed later) ... or...
we can create the GET_PARAMs.

A better idea would be to figure out how to control the driver load order and
force mei driver + components to get called right after i915. I was informed
there is no way to control this and changes here will likely not be accepted
upstream.

++ Daniele - can you chime in?

Take note that ADL has the same issue but for whatever reason, the dependant
mei component on ADL loaded much sooner - so it was never an issue that was
caught but still existed on ADL time merge (if users customize the kernel + 
compositor for fastboot it will happen).(i realize I havent tested ADL with the
new kernel configs that we use to also boot PXP on MTL - wonder if the new
mei configs are causing the delay - i.e. ADL customer could suddenly see this
6 sec delay too. - something i have to check now)
Daniele Ceraolo Spurio April 26, 2023, 6:10 p.m. UTC | #13
On 4/26/2023 9:48 AM, Teres Alexis, Alan Previn wrote:
> On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote:
>> On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
>>> (+ Faith and Daniel as they have been involved in previous discussions)
>>> Quoting Jordan Justen (2023-04-24 20:13:00)
>>>> On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
>>>>>
> alan:snip
>
>> - the more a feature spans drivers/modules, the more it should be
>>    discovered by trying it out, e.g. dma-buf fence import/export was a huge
>>    discussion, luckily mesa devs figured out how to transparantly fall back
>>    at runtime so we didn't end up merging the separate feature flag (I
>>    think at least, can't find it). pxp being split across i915/me/fw/who
>>    knows what else is kinda similar so I'd heavily lean towards discovery
>>    by creating a context
>>
>> - pxp taking 8s to init a ctx sounds very broken, irrespective of anything
>>    else

I think there has been a bit of confusion in regards to this timeout and 
to where it applies, so let me try to clarify to make sure we're all on 
the same page (Alan has already explained most of it below, but I'm 
going to go in a bit more detail and I want to make sure it's all in one 
place for reference).
Before we can do any PXP operation, dependencies need to be satisfied, 
some of which are outside of i915. For MTL, these are:

GSC FW needs to be loaded (~250 ms)
HuC FW needs to be authenticated for PXP ops (~20 ms)
MEI modules need to be bound (depends on the probe ordering, but usually 
a few secs)
GSC SW proxy via MEI needs to be established (~500 ms normally, but can 
take a few seconds on the first boot after a firmware update)

Due to the fact that these can take several seconds in total to 
complete, to avoid stalling driver load/resume for that long we moved 
the i915-side operations to a separate worker and we register i915 
before they've completed. This means that we can get a PXP context 
creation call before all the dependencies are in place, in which case we 
do need to wait and that's where the 8s come from. After all the pieces 
are in place, a PXP context creation call is much faster (up to ~150 ms, 
which is the time required to start the PXP session if it is not already 
running).

The reason why we suggested a dedicated getparam was to avoid requiring 
early users to wait for all of that to happen just to check the 
capability. By the time an user actually wants to use PXP, we're likely 
done with the prep steps (or at least we're far along with them) and 
therefore the wait will be short.

> Alan: Please be aware that:
> 1. the wait-timeout was changed to 1 second sometime back.
> 2. the I'm not deciding the time-out. I initially wanted to keep it at the same
> timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per
> same ADL behavior). Daniele requested to move it to 8 seconds - but thru review
> process, we reduced it to 1 second.
> 3. In anycase, thats just the wait-timeout - and we know it wont succeed until
> ~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915
> - its the component driver load <-- this is what's broken.

I think the question here is whether the mei driver is taking a long 
time to probe or if it is just being probed late. In the latter case, I 
wouldn't call it broken.

>
> Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy
> component driver load + bind, huc-authentication and gsc-proxy-init-handshake.
> Most of above steps begin rather quickly during i915 driver load - the delay
> seems to come from a very late mei-gsc-proxy component driver load. In fact the
> parent mei-me driver is only getting ~6 seconds after i915 init is done. That
> blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP.
>
> That said, what is broken is why it takes so long to get the component drivers
> to come up. NOTE: PXP isnt really doing anything differently in the context
> creation flow (in terms of time-consuming-steps compared to ADL) besides the
> extra dependency waits these.
>
> We can actually go back to the original timeout of 250 milisecs like we have in ADL
> but will fail if MESA calls in too early (but will succeed later) ... or...
> we can create the GET_PARAMs.
>
> A better idea would be to figure out how to control the driver load order and
> force mei driver + components to get called right after i915. I was informed
> there is no way to control this and changes here will likely not be accepted
> upstream.

we could add a device link to mark i915 as a consumer of mei, but I 
believe that wouldn't work for 2 reasons

1 - on discrete, mei binds to a child device of i915, so the dependency 
is reversed
2 - the link might just delay the i915 load to after the mei load, which 
I'm not sure it is something we want (and at that point we could also 
just wait for mei to bind from within the i915 load).

Daniele

>
> ++ Daniele - can you chime in?
>
> Take note that ADL has the same issue but for whatever reason, the dependant
> mei component on ADL loaded much sooner - so it was never an issue that was
> caught but still existed on ADL time merge (if users customize the kernel +
> compositor for fastboot it will happen).(i realize I havent tested ADL with the
> new kernel configs that we use to also boot PXP on MTL - wonder if the new
> mei configs are causing the delay - i.e. ADL customer could suddenly see this
> 6 sec delay too. - something i have to check now)
Jordan Justen April 26, 2023, 8:04 p.m. UTC | #14
On 2023-04-26 04:52:43, Daniel Vetter wrote:
> 
> Joonas asked me to put my thoughts here:
> 
> - in general the "feature discovery by trying it out" approach is most
>   robust and hence preferred, but it's also not something that's required
>   when there's good reasons against it

More robust in what sense?

Userspace has to set up some scenario to use the interface, which
hopefully is not too complex. It's difficult to predict what it may
look like in the future since we are talking about undefined
extensions.

Userspace has to rely on the kernel making creating and destroying
this thing essentially free. For
I915_GEM_CREATE_EXT_PROTECTED_CONTENT, that did not work out. For
I915_GEM_CREATE_EXT_SET_PAT, just wondering, since the PAT indices are
platform specific, what might happen if we tried to choose some common
index value to detect the extension in a generic manner? Could it
potentially lead to unexpected behavior, or maybe just an error. I
guess it's really extension specific what kind of issues potentially
could arise.

> tldr; prefer discovery, don't sweat it if not, definitely don't
> overengineer this with some magic "give me all extensions" thing because
> that results in guaranteed cross-component backporting pains sooner or
> later. Or inconsistency, which defeats the point.

I guess I don't know the full context of your concerns here.

For returning the gem-create extensions, isn't this just a matter of
returning the valid indices to the create_extensions array in
i915_gem_create.c? Is that an over-engineered query?

It seems weird that there's a reasonably well defined "extension"
mechanism here, but no way for the kernel to just tell us what
extensions it knows about.

-Jordan
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index bfe1dbda4cb7..723c3ddd6c74 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -245,6 +245,7 @@  struct create_ext {
 	unsigned int n_placements;
 	unsigned int placement_mask;
 	unsigned long flags;
+	unsigned int pat_index;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -394,11 +395,39 @@  static int ext_set_protected(struct i915_user_extension __user *base, void *data
 	return 0;
 }
 
+static int ext_set_pat(struct i915_user_extension __user *base, void *data)
+{
+	struct create_ext *ext_data = data;
+	struct drm_i915_private *i915 = ext_data->i915;
+	struct drm_i915_gem_create_ext_set_pat ext;
+	unsigned int max_pat_index;
+
+	BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
+		     offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd));
+
+	if (copy_from_user(&ext, base, sizeof(ext)))
+		return -EFAULT;
+
+	max_pat_index = INTEL_INFO(i915)->max_pat_index;
+
+	if (ext.pat_index > max_pat_index) {
+		drm_dbg(&i915->drm, "PAT index is invalid: %u\n",
+			ext.pat_index);
+		return -EINVAL;
+	}
+
+	ext_data->pat_index = ext.pat_index;
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
 	[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
+	[I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
 };
 
+#define PAT_INDEX_NOT_SET	0xffff
 /**
  * i915_gem_create_ext_ioctl - Creates a new mm object and returns a handle to it.
  * @dev: drm device pointer
@@ -418,6 +447,7 @@  i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
 		return -EINVAL;
 
+	ext_data.pat_index = PAT_INDEX_NOT_SET;
 	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
 				   create_extensions,
 				   ARRAY_SIZE(create_extensions),
@@ -454,5 +484,11 @@  i915_gem_create_ext_ioctl(struct drm_device *dev, void *data,
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
+	if (ext_data.pat_index != PAT_INDEX_NOT_SET) {
+		i915_gem_object_set_pat_index(obj, ext_data.pat_index);
+		/* Mark pat_index is set by UMD */
+		obj->cache_level = I915_CACHE_INVAL;
+	}
+
 	return i915_gem_publish(obj, file, &args->size, &args->handle);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 27c948350b5b..61651f7e5806 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -209,6 +209,12 @@  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 	if (!(obj->flags & I915_BO_ALLOC_USER))
 		return false;
 
+	/*
+	 * Always flush cache for UMD objects at creation time.
+	 */
+	if (obj->cache_level == I915_CACHE_INVAL)
+		return true;
+
 	/*
 	 * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
 	 * possible for userspace to bypass the GTT caching bits set by the
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dba7c5a5b25e..03c5c314846e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3630,9 +3630,13 @@  struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_SET_PAT usage see
+	 * struct drm_i915_gem_create_ext_set_pat.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_SET_PAT 2
 	__u64 extensions;
 };
 
@@ -3747,6 +3751,38 @@  struct drm_i915_gem_create_ext_protected_content {
 	__u32 flags;
 };
 
+/**
+ * struct drm_i915_gem_create_ext_set_pat - The
+ * I915_GEM_CREATE_EXT_SET_PAT extension.
+ *
+ * If this extension is provided, the specified caching policy (PAT index) is
+ * applied to the buffer object.
+ *
+ * Below is an example on how to create an object with specific caching policy:
+ *
+ * .. code-block:: C
+ *
+ *      struct drm_i915_gem_create_ext_set_pat set_pat_ext = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_SET_PAT },
+ *              .pat_index = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = PAGE_SIZE,
+ *              .extensions = (uintptr_t)&set_pat_ext,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ */
+struct drm_i915_gem_create_ext_set_pat {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+	/** @pat_index: PAT index to be set */
+	__u32 pat_index;
+	/** @rsvd: reserved for future use */
+	__u32 rsvd;
+};
+
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf
 
diff --git a/tools/include/uapi/drm/i915_drm.h b/tools/include/uapi/drm/i915_drm.h
index 8df261c5ab9b..8cdcdb5fac26 100644
--- a/tools/include/uapi/drm/i915_drm.h
+++ b/tools/include/uapi/drm/i915_drm.h
@@ -3607,9 +3607,13 @@  struct drm_i915_gem_create_ext {
 	 *
 	 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 	 * struct drm_i915_gem_create_ext_protected_content.
+	 *
+	 * For I915_GEM_CREATE_EXT_SET_PAT usage see
+	 * struct drm_i915_gem_create_ext_set_pat.
 	 */
 #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_SET_PAT 2
 	__u64 extensions;
 };
 
@@ -3724,6 +3728,38 @@  struct drm_i915_gem_create_ext_protected_content {
 	__u32 flags;
 };
 
+/**
+ * struct drm_i915_gem_create_ext_set_pat - The
+ * I915_GEM_CREATE_EXT_SET_PAT extension.
+ *
+ * If this extension is provided, the specified caching policy (PAT index) is
+ * applied to the buffer object.
+ *
+ * Below is an example on how to create an object with specific caching policy:
+ *
+ * .. code-block:: C
+ *
+ *      struct drm_i915_gem_create_ext_set_pat set_pat_ext = {
+ *              .base = { .name = I915_GEM_CREATE_EXT_SET_PAT },
+ *              .pat_index = 0,
+ *      };
+ *      struct drm_i915_gem_create_ext create_ext = {
+ *              .size = PAGE_SIZE,
+ *              .extensions = (uintptr_t)&set_pat_ext,
+ *      };
+ *
+ *      int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
+ *      if (err) ...
+ */
+struct drm_i915_gem_create_ext_set_pat {
+	/** @base: Extension link. See struct i915_user_extension. */
+	struct i915_user_extension base;
+	/** @pat_index: PAT index to be set */
+	__u32 pat_index;
+	/** @rsvd: reserved for future use */
+	__u32 rsvd;
+};
+
 /* ID of the protected content session managed by i915 when PXP is active */
 #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf