Message ID | 20230419230058.2659455-9-fei.yang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/mtl: Define MOCS and PAT tables for MTL | expand |
Hi Fei, > To comply with the design that buffer objects shall have immutable > cache setting through out their life cycle, {set, get}_caching ioctl's > are no longer supported from MTL onward. With that change caching > policy can only be set at object creation time. The current code > applies a default (platform dependent) cache setting for all objects. > However this is not optimal for performance tuning. The patch extends > the existing gem_create uAPI to let user set PAT index for the object > at creation time. > The new extension is platform independent, so UMD's can switch to using > this extension for older platforms as well, while {set, get}_caching are > still supported on these legacy paltforms for compatibility reason. > > Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > Cc: Matt Roper <matthew.d.roper@intel.com> > Cc: Andi Shyti <andi.shyti@linux.intel.com> > Signed-off-by: Fei Yang <fei.yang@intel.com> > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> because this is an API change, we need some more information here. First of all you need to CC the userspace guys that have been working on top of your series and get their ack's. I also believe that this series has also been tested on a separate repository, would you link it in the commit message? Thanks, Andi
On 20/04/2023 12:39, Andi Shyti wrote: > Hi Fei, > >> To comply with the design that buffer objects shall have immutable >> cache setting through out their life cycle, {set, get}_caching ioctl's >> are no longer supported from MTL onward. With that change caching >> policy can only be set at object creation time. The current code >> applies a default (platform dependent) cache setting for all objects. >> However this is not optimal for performance tuning. The patch extends >> the existing gem_create uAPI to let user set PAT index for the object >> at creation time. >> The new extension is platform independent, so UMD's can switch to using >> this extension for older platforms as well, while {set, get}_caching are >> still supported on these legacy paltforms for compatibility reason. >> >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >> Cc: Matt Roper <matthew.d.roper@intel.com> >> Cc: Andi Shyti <andi.shyti@linux.intel.com> >> Signed-off-by: Fei Yang <fei.yang@intel.com> >> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> > > because this is an API change, we need some more information > here. > > First of all you need to CC the userspace guys that have been > working on top of your series and get their ack's. Yes, and a link to a Mesa merge request which uses the uapi should be included. IGTs should be ready to before we can merge. I glanced over igt-dev but did not spot anything. Regards, Tvrtko > > I also believe that this series has also been tested on a > separate repository, would you link it in the commit message? > > Thanks, > Andi
> On 20/04/2023 12:39, Andi Shyti wrote: >> Hi Fei, >> >>> To comply with the design that buffer objects shall have immutable >>> cache setting through out their life cycle, {set, get}_caching ioctl's >>> are no longer supported from MTL onward. With that change caching >>> policy can only be set at object creation time. The current code >>> applies a default (platform dependent) cache setting for all objects. >>> However this is not optimal for performance tuning. The patch extends >>> the existing gem_create uAPI to let user set PAT index for the object >>> at creation time. >>> The new extension is platform independent, so UMD's can switch to using >>> this extension for older platforms as well, while {set, get}_caching are >>> still supported on these legacy paltforms for compatibility reason. >>> >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >>> Cc: Matt Roper <matthew.d.roper@intel.com> >>> Cc: Andi Shyti <andi.shyti@linux.intel.com> >>> Signed-off-by: Fei Yang <fei.yang@intel.com> >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> >> >> because this is an API change, we need some more information >> here. >> >> First of all you need to CC the userspace guys that have been >> working on top of your series and get their ack's. > > Yes, and a link to a Mesa merge request which uses the uapi should be > included. Working with Mesa team on this, stay tuned. > IGTs should be ready to before we can merge. I glanced over igt-dev but > did not spot anything. There is a IGT patch posted to gfx-internal-devel, titled "test/gem_create: exercise gem_create_ext_set_pat" > Regards, > > Tvrtko > >> >> I also believe that this series has also been tested on a >> separate repository, would you link it in the commit message? >> >> Thanks, >> Andi
Hi Fei, > >>> To comply with the design that buffer objects shall have immutable > >>> cache setting through out their life cycle, {set, get}_caching ioctl's > >>> are no longer supported from MTL onward. With that change caching > >>> policy can only be set at object creation time. The current code > >>> applies a default (platform dependent) cache setting for all objects. > >>> However this is not optimal for performance tuning. The patch extends > >>> the existing gem_create uAPI to let user set PAT index for the object > >>> at creation time. > >>> The new extension is platform independent, so UMD's can switch to using > >>> this extension for older platforms as well, while {set, get}_caching are > >>> still supported on these legacy paltforms for compatibility reason. > >>> > >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >>> Cc: Matt Roper <matthew.d.roper@intel.com> > >>> Cc: Andi Shyti <andi.shyti@linux.intel.com> > >>> Signed-off-by: Fei Yang <fei.yang@intel.com> > >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> > >> > >> because this is an API change, we need some more information > >> here. > >> > >> First of all you need to CC the userspace guys that have been > >> working on top of your series and get their ack's. > > > > Yes, and a link to a Mesa merge request which uses the uapi should be > > included. > > Working with Mesa team on this, stay tuned. > > > IGTs should be ready to before we can merge. I glanced over igt-dev but > > did not spot anything. > > There is a IGT patch posted to gfx-internal-devel, titled "test/gem_create: > exercise gem_create_ext_set_pat" Any chance to have it ported upstream? It's OK even if it's not merged (at least on my side) but some public interface testing is needed. If you do post it upstream you could add in the cover letter: Test-with: <mail-id> where the mail-id is referred to the upstream patch of the test. Andi
On 2023-04-20 09:11:18, Yang, Fei wrote: > > On 20/04/2023 12:39, Andi Shyti wrote: > >> Hi Fei, > >> > >>> To comply with the design that buffer objects shall have immutable > >>> cache setting through out their life cycle, {set, get}_caching ioctl's > >>> are no longer supported from MTL onward. With that change caching > >>> policy can only be set at object creation time. The current code > >>> applies a default (platform dependent) cache setting for all objects. > >>> However this is not optimal for performance tuning. The patch extends > >>> the existing gem_create uAPI to let user set PAT index for the object > >>> at creation time. > >>> The new extension is platform independent, so UMD's can switch to using > >>> this extension for older platforms as well, while {set, get}_caching are > >>> still supported on these legacy paltforms for compatibility reason. > >>> > >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > >>> Cc: Matt Roper <matthew.d.roper@intel.com> > >>> Cc: Andi Shyti <andi.shyti@linux.intel.com> > >>> Signed-off-by: Fei Yang <fei.yang@intel.com> > >>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> > >> > >> because this is an API change, we need some more information > >> here. > >> > >> First of all you need to CC the userspace guys that have been > >> working on top of your series and get their ack's. > > > > Yes, and a link to a Mesa merge request which uses the uapi should be > > included. > > Working with Mesa team on this, stay tuned. > I would like to see the extension detection issue is handled before ack'ing this. How about a new DRM_I915_QUERY_GEM_CREATE_EXTENSIONS item, that returns a u64 array of usable extension names for DRM_IOCTL_I915_GEM_CREATE_EXT? A similar DRM_I915_QUERY_GEM_CONTEXT_CREATE_EXTENSIONS could also provide an alternative to Alan's "drm/i915/uapi/pxp: Add a GET_PARAM for PXP", and more easily allow advertising future new extensions for context/buffer creation. -Jordan
[fixed mailing lists addresses] On 24/04/2023 09:36, Jordan Justen wrote: > On 2023-04-23 00:05:06, Yang, Fei wrote: >>> On 2023-04-20 09:11:18, Yang, Fei wrote: >>>>> On 20/04/2023 12:39, Andi Shyti wrote: >>>>>> Hi Fei, >>>>>> >>>>>> because this is an API change, we need some more information here. >>>>>> >>>>>> First of all you need to CC the userspace guys that have been >>>>>> working on top of your series and get their ack's. >>>>> >>>>> Yes, and a link to a Mesa merge request which uses the uapi should >>>>> be included. >>>> >>>> Working with Mesa team on this, stay tuned. >>>> >>> >>> I would like to see the extension detection issue is handled >>> before ack'ing this. >>> >>> How about a new DRM_I915_QUERY_GEM_CREATE_EXTENSIONS item, that returns >>> a u64 array of usable extension names for DRM_IOCTL_I915_GEM_CREATE_EXT? >> >> I agree a query mechanism is necessary, but that should be generic for all >> uAPI's, not just for GEM_CREATE. >> >>> A similar DRM_I915_QUERY_GEM_CONTEXT_CREATE_EXTENSIONS could also provide >>> an alternative to Alan's "drm/i915/uapi/pxp: Add a GET_PARAM for PXP", >>> and more easily allow advertising future new extensions for context/buffer >>> creation. >> >> I think we should have a discussion and come up with a sustainable design for >> the query uAPI, rather than just put in a quick hack for this. > > I think you are being a bit too quick to dismiss my idea as a quick > hack... Nevetheless, I would love to hear an alternate suggestion. > Just as long as it's not, "let's figure this out later, because I need > to add this feature now". > > I don't think "just try to use it and if it fails, I guess it isn't > supported" is reasonable. So, if we can't do better, at least add a > GET_PARAM. Yeah, it's a quick hack, but it's better than nothing. Being able to "list" supported extensions sounds like a reasonable principle, albeit a departure from the design direction to date. Which means there are probably no quick solutions. Also, AFAIU, only PXP context create is the problematic one, right? Everything else is pretty much instant or delayed allocation so super cheap to probe by attempting to use. If I got that right and given this series is about drm_i915_gem_create_ext I don't think this side discussion should be blocking it. Furthermore the PXP context create story is even more complicated, in a way that it is not just about querying whether the extension is supported, but the expensive check is something more complicated. Going back to implementation details for this proposed new feature, one alternative to query could be something like: drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS; That would be somewhat more light weight to implement that the i915_query route. And it appears it would work for all ioctls which support extensions apart for i915_context_param_engines. Regards, Tvrtko
On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > Being able to "list" supported extensions sounds like a reasonable > principle, albeit a departure from the design direction to date. > Which means there are probably no quick solutions. Also, AFAIU, only > PXP context create is the problematic one, right? Everything else is > pretty much instant or delayed allocation so super cheap to probe by > attempting to use. > > If I got that right and given this series is about > drm_i915_gem_create_ext I don't think this side discussion should be > blocking it. This still leaves the issue of no reasonable detection mechanism for the extension. If the discussion gets too complicated, then can we add a GET_PARAM for the SET_PAT extension? I'm hoping we could either come up with something better reasonably quickly, or i915/Xe can add a new param for each new extensions until a better approach is available. > Furthermore the PXP context create story is even more complicated, > in a way that it is not just about querying whether the extension is > supported, but the expensive check is something more complicated. > > Going back to implementation details for this proposed new feature, > one alternative to query could be something like: > > drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS; > > That would be somewhat more light weight to implement that the > i915_query route. And it appears it would work for all ioctls which > support extensions apart for i915_context_param_engines. This seems little better than the "try it, and if it works then it's supported". I'm not suggesting that userspace should be able to check that scenario x+y+z will work, but more a list of extensions that conceivably could work. Normally this should just a matter of the kernel unconditionally adding the newly implemented extension to the list returned in the query call. If a GET_PARAM can be made for the PXP case, then it seems like a query item returning CONTEXT_CREATE extensions could conditionally omit that extension just as easily as implementing the proposed new GET_PARAM. -Jordan
(+ Faith and Daniel as they have been involved in previous discussions) Quoting Jordan Justen (2023-04-24 20:13:00) > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > > > Being able to "list" supported extensions sounds like a reasonable > > principle, albeit a departure from the design direction to date. > > Which means there are probably no quick solutions. Also, AFAIU, only > > PXP context create is the problematic one, right? Everything else is > > pretty much instant or delayed allocation so super cheap to probe by > > attempting to use. > > > > If I got that right and given this series is about > > drm_i915_gem_create_ext I don't think this side discussion should be > > blocking it. > > This still leaves the issue of no reasonable detection mechanism for > the extension. I remember this exact discussion being repeated at least a few times in the past 5 years. Previously the conclusion was always that detection by trying to use the extension leads to least amount of uAPI surface. There is also no concern of having mismatch in backporting of the query and the functionality patches. Why exactly do you think it is more reasonable to do the following? check_if_extension_query_is_supported(); <check retval> check_if_extension_xyz_is_supported(); <check retval> VS create_[context,buffer,whatever]_with_extension(); <check errno> destroy_[context,buffer,whatever](); For contexts and buffers there's no overhead, and we're keeping the uAPI surface smaller (less bugs, less validation effort). Additionally we support checking combinations of extensions A, B and C without extra effort. If we're not returning enough clear errnos, then that is something to make sure we do. Regards, Joonas > If the discussion gets too complicated, then can we add > a GET_PARAM for the SET_PAT extension? I'm hoping we could either come > up with something better reasonably quickly, or i915/Xe can add a new > param for each new extensions until a better approach is available. > > > Furthermore the PXP context create story is even more complicated, > > in a way that it is not just about querying whether the extension is > > supported, but the expensive check is something more complicated. > > > > Going back to implementation details for this proposed new feature, > > one alternative to query could be something like: > > > > drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS; > > > > That would be somewhat more light weight to implement that the > > i915_query route. And it appears it would work for all ioctls which > > support extensions apart for i915_context_param_engines. > > This seems little better than the "try it, and if it works then it's > supported". > > I'm not suggesting that userspace should be able to check that > scenario x+y+z will work, but more a list of extensions that > conceivably could work. Normally this should just a matter of the > kernel unconditionally adding the newly implemented extension to the > list returned in the query call. > > If a GET_PARAM can be made for the PXP case, then it seems like a > query item returning CONTEXT_CREATE extensions could conditionally > omit that extension just as easily as implementing the proposed new > GET_PARAM. > > -Jordan
On Tue, 2023-04-25 at 16:41 +0300, Joonas Lahtinen wrote:
> (+ Faith and Daniel as they have been involved in previous discussions)
An orthogonal (but losely related) question: Is PXP the only subsystem that has
the unique problem of: Uses a delayed worker to complete all dependencies for init..
but they take so long that by the time compositors-mesa-init comes up, creation
of PXP context blocks too long and may still likely failing and incorrectly
assuming PXP is not supported. (when we don't have a GET_PARAM for it).
I believe HuC has a similiar issue - but doesnt reflect in the UAPI but rather the cmd buffers.
We don't have other subsystems that have this problem? (dependency on other startups?)
On 2023-04-25 06:41:54, Joonas Lahtinen wrote: > (+ Faith and Daniel as they have been involved in previous discussions) > > Quoting Jordan Justen (2023-04-24 20:13:00) > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > > > > > Being able to "list" supported extensions sounds like a reasonable > > > principle, albeit a departure from the design direction to date. > > > Which means there are probably no quick solutions. Also, AFAIU, only > > > PXP context create is the problematic one, right? Everything else is > > > pretty much instant or delayed allocation so super cheap to probe by > > > attempting to use. > > > > > > If I got that right and given this series is about > > > drm_i915_gem_create_ext I don't think this side discussion should be > > > blocking it. > > > > This still leaves the issue of no reasonable detection mechanism for > > the extension. > > I remember this exact discussion being repeated at least a few times in > the past 5 years. Previously the conclusion was always that detection by > trying to use the extension leads to least amount of uAPI surface. There > is also no concern of having mismatch in backporting of the query and the > functionality patches. > > Why exactly do you think it is more reasonable to do the following? > > check_if_extension_query_is_supported(); > <check retval> > check_if_extension_xyz_is_supported(); > <check retval> As I've mentioned a couple times, I'm asking for query item that returns it all the extensions that conceivably could work. In theory it could be made a single query item which somehow then enumerates if something is a context extension or bo extension. But, it seems simpler for all if we just have a couple query items returning a simple u64 array for the few classes of extensions. > VS > > create_[context,buffer,whatever]_with_extension(); > <check errno> > destroy_[context,buffer,whatever](); > > For contexts and buffers there's no overhead, There's no-overhead to creating and destroying things? I think the 8s timeout when trying create a protected content context shows it's not always quite that simple. Over time userspace will have to continue growing this set of create/destroy tests as new extensions are added. Simply so we can discover what extensions are supported. This doesn't seem like a very robust way to advertise extensions for an api. Another point ... say you need to debug why some simple app is failing to start the driver. One tool might be strace. But now you have a bunch of noise of calls from the driver creating and destroying things just to discover what extensions the kernel supports. It would be nice if all this was handled with a query item vs a bunch of create/destroys. > and we're keeping the uAPI surface smaller (less bugs, less > validation effort). I understand wanting to keep the kernel uapi and implementation simple. Is it too complicated to return a u64 array for a query item indicating the extensions supported for the various classes of extensions? I think in most cases it'll just be essentially shuffling across an array of u64 items. (Since most known extensions will always be supported by the kernel.) > Additionally we support checking combinations of extensions A, B and > C without extra effort. Regarding combinations of extensions, is that really something userspace needs to probe? I would think if userspace knows about the possible extensions, then it's on userspace to use combinations appropriately. But, in detecting extensions, it is possible that, say extension B relies on extension A. Now userspace may have to probe to see if extension A is supported, and then probe for extension B while using extension A. Essentially, probing for an extension could become a bit complicated. Vs the kernel just giving us a u64 array of known extensions... -Jordan
On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote: > (+ Faith and Daniel as they have been involved in previous discussions) > > Quoting Jordan Justen (2023-04-24 20:13:00) > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > > > > > Being able to "list" supported extensions sounds like a reasonable > > > principle, albeit a departure from the design direction to date. > > > Which means there are probably no quick solutions. Also, AFAIU, only > > > PXP context create is the problematic one, right? Everything else is > > > pretty much instant or delayed allocation so super cheap to probe by > > > attempting to use. > > > > > > If I got that right and given this series is about > > > drm_i915_gem_create_ext I don't think this side discussion should be > > > blocking it. > > > > This still leaves the issue of no reasonable detection mechanism for > > the extension. > > I remember this exact discussion being repeated at least a few times in > the past 5 years. Previously the conclusion was always that detection by > trying to use the extension leads to least amount of uAPI surface. There > is also no concern of having mismatch in backporting of the query and the > functionality patches. > > Why exactly do you think it is more reasonable to do the following? > > check_if_extension_query_is_supported(); > <check retval> > check_if_extension_xyz_is_supported(); > <check retval> > > VS > > create_[context,buffer,whatever]_with_extension(); > <check errno> > destroy_[context,buffer,whatever](); > > For contexts and buffers there's no overhead, and we're keeping the uAPI > surface smaller (less bugs, less validation effort). Additionally we > support checking combinations of extensions A, B and C without extra > effort. > > If we're not returning enough clear errnos, then that is something to > make sure we do. Joonas asked me to put my thoughts here: - in general the "feature discovery by trying it out" approach is most robust and hence preferred, but it's also not something that's required when there's good reasons against it - the more a feature spans drivers/modules, the more it should be discovered by trying it out, e.g. dma-buf fence import/export was a huge discussion, luckily mesa devs figured out how to transparantly fall back at runtime so we didn't end up merging the separate feature flag (I think at least, can't find it). pxp being split across i915/me/fw/who knows what else is kinda similar so I'd heavily lean towards discovery by creating a context - pxp taking 8s to init a ctx sounds very broken, irrespective of anything else - in practice there's not really a combinatorial explosion, for a lot of reasons: - kernel and userspace tend to assume/require implied features where it makes sense, e.g. in kms atomic implies planes and universal planes. mesa has been doing the same - you cold go all the way to what radeon/amdgpu have done for years with a single incrementing version. Probably not flexible enough for intel. - every pciid is it's own uapi, so a feature only needs to be tested on platforms where it didn't ship from the start. Also cuts down on runtime discovery a lot - mesa tends to only support down to current lts kernels and not older, which again cuts the combinations a lot - I did look through upstream kernel docs for general (driver) uapi recommendations, but there isn't really anything about good taste stuff, just a lot about not screwing up compatibility across kernels/platforms. tldr; prefer discovery, don't sweat it if not, definitely don't overengineer this with some magic "give me all extensions" thing because that results in guaranteed cross-component backporting pains sooner or later. Or inconsistency, which defeats the point. Cheers, Daniel > Regards, Joonas > > > If the discussion gets too complicated, then can we add > > a GET_PARAM for the SET_PAT extension? I'm hoping we could either come > > up with something better reasonably quickly, or i915/Xe can add a new > > param for each new extensions until a better approach is available. > > > > > Furthermore the PXP context create story is even more complicated, > > > in a way that it is not just about querying whether the extension is > > > supported, but the expensive check is something more complicated. > > > > > > Going back to implementation details for this proposed new feature, > > > one alternative to query could be something like: > > > > > > drm_i915_gem_create_ext.flags |= I915_GEM_CREATE_EXT_FLAG_PROBE_EXTENSIONS; > > > > > > That would be somewhat more light weight to implement that the > > > i915_query route. And it appears it would work for all ioctls which > > > support extensions apart for i915_context_param_engines. > > > > This seems little better than the "try it, and if it works then it's > > supported". > > > > I'm not suggesting that userspace should be able to check that > > scenario x+y+z will work, but more a list of extensions that > > conceivably could work. Normally this should just a matter of the > > kernel unconditionally adding the newly implemented extension to the > > list returned in the query call. > > > > If a GET_PARAM can be made for the PXP case, then it seems like a > > query item returning CONTEXT_CREATE extensions could conditionally > > omit that extension just as easily as implementing the proposed new > > GET_PARAM. > > > > -Jordan
On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote: > On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote: > > (+ Faith and Daniel as they have been involved in previous discussions) > > Quoting Jordan Justen (2023-04-24 20:13:00) > > > On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: > > > > > > > > > alan:snip > - the more a feature spans drivers/modules, the more it should be > discovered by trying it out, e.g. dma-buf fence import/export was a huge > discussion, luckily mesa devs figured out how to transparantly fall back > at runtime so we didn't end up merging the separate feature flag (I > think at least, can't find it). pxp being split across i915/me/fw/who > knows what else is kinda similar so I'd heavily lean towards discovery > by creating a context > > - pxp taking 8s to init a ctx sounds very broken, irrespective of anything > else > Alan: Please be aware that: 1. the wait-timeout was changed to 1 second sometime back. 2. the I'm not deciding the time-out. I initially wanted to keep it at the same timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per same ADL behavior). Daniele requested to move it to 8 seconds - but thru review process, we reduced it to 1 second. 3. In anycase, thats just the wait-timeout - and we know it wont succeed until ~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915 - its the component driver load <-- this is what's broken. Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy component driver load + bind, huc-authentication and gsc-proxy-init-handshake. Most of above steps begin rather quickly during i915 driver load - the delay seems to come from a very late mei-gsc-proxy component driver load. In fact the parent mei-me driver is only getting ~6 seconds after i915 init is done. That blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP. That said, what is broken is why it takes so long to get the component drivers to come up. NOTE: PXP isnt really doing anything differently in the context creation flow (in terms of time-consuming-steps compared to ADL) besides the extra dependency waits these. We can actually go back to the original timeout of 250 milisecs like we have in ADL but will fail if MESA calls in too early (but will succeed later) ... or... we can create the GET_PARAMs. A better idea would be to figure out how to control the driver load order and force mei driver + components to get called right after i915. I was informed there is no way to control this and changes here will likely not be accepted upstream. ++ Daniele - can you chime in? Take note that ADL has the same issue but for whatever reason, the dependant mei component on ADL loaded much sooner - so it was never an issue that was caught but still existed on ADL time merge (if users customize the kernel + compositor for fastboot it will happen).(i realize I havent tested ADL with the new kernel configs that we use to also boot PXP on MTL - wonder if the new mei configs are causing the delay - i.e. ADL customer could suddenly see this 6 sec delay too. - something i have to check now)
On 4/26/2023 9:48 AM, Teres Alexis, Alan Previn wrote: > On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote: >> On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote: >>> (+ Faith and Daniel as they have been involved in previous discussions) >>> Quoting Jordan Justen (2023-04-24 20:13:00) >>>> On 2023-04-24 02:08:43, Tvrtko Ursulin wrote: >>>>> > alan:snip > >> - the more a feature spans drivers/modules, the more it should be >> discovered by trying it out, e.g. dma-buf fence import/export was a huge >> discussion, luckily mesa devs figured out how to transparantly fall back >> at runtime so we didn't end up merging the separate feature flag (I >> think at least, can't find it). pxp being split across i915/me/fw/who >> knows what else is kinda similar so I'd heavily lean towards discovery >> by creating a context >> >> - pxp taking 8s to init a ctx sounds very broken, irrespective of anything >> else I think there has been a bit of confusion in regards to this timeout and to where it applies, so let me try to clarify to make sure we're all on the same page (Alan has already explained most of it below, but I'm going to go in a bit more detail and I want to make sure it's all in one place for reference). Before we can do any PXP operation, dependencies need to be satisfied, some of which are outside of i915. For MTL, these are: GSC FW needs to be loaded (~250 ms) HuC FW needs to be authenticated for PXP ops (~20 ms) MEI modules need to be bound (depends on the probe ordering, but usually a few secs) GSC SW proxy via MEI needs to be established (~500 ms normally, but can take a few seconds on the first boot after a firmware update) Due to the fact that these can take several seconds in total to complete, to avoid stalling driver load/resume for that long we moved the i915-side operations to a separate worker and we register i915 before they've completed. This means that we can get a PXP context creation call before all the dependencies are in place, in which case we do need to wait and that's where the 8s come from. After all the pieces are in place, a PXP context creation call is much faster (up to ~150 ms, which is the time required to start the PXP session if it is not already running). The reason why we suggested a dedicated getparam was to avoid requiring early users to wait for all of that to happen just to check the capability. By the time an user actually wants to use PXP, we're likely done with the prep steps (or at least we're far along with them) and therefore the wait will be short. > Alan: Please be aware that: > 1. the wait-timeout was changed to 1 second sometime back. > 2. the I'm not deciding the time-out. I initially wanted to keep it at the same > timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per > same ADL behavior). Daniele requested to move it to 8 seconds - but thru review > process, we reduced it to 1 second. > 3. In anycase, thats just the wait-timeout - and we know it wont succeed until > ~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915 > - its the component driver load <-- this is what's broken. I think the question here is whether the mei driver is taking a long time to probe or if it is just being probed late. In the latter case, I wouldn't call it broken. > > Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy > component driver load + bind, huc-authentication and gsc-proxy-init-handshake. > Most of above steps begin rather quickly during i915 driver load - the delay > seems to come from a very late mei-gsc-proxy component driver load. In fact the > parent mei-me driver is only getting ~6 seconds after i915 init is done. That > blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP. > > That said, what is broken is why it takes so long to get the component drivers > to come up. NOTE: PXP isnt really doing anything differently in the context > creation flow (in terms of time-consuming-steps compared to ADL) besides the > extra dependency waits these. > > We can actually go back to the original timeout of 250 milisecs like we have in ADL > but will fail if MESA calls in too early (but will succeed later) ... or... > we can create the GET_PARAMs. > > A better idea would be to figure out how to control the driver load order and > force mei driver + components to get called right after i915. I was informed > there is no way to control this and changes here will likely not be accepted > upstream. we could add a device link to mark i915 as a consumer of mei, but I believe that wouldn't work for 2 reasons 1 - on discrete, mei binds to a child device of i915, so the dependency is reversed 2 - the link might just delay the i915 load to after the mei load, which I'm not sure it is something we want (and at that point we could also just wait for mei to bind from within the i915 load). Daniele > > ++ Daniele - can you chime in? > > Take note that ADL has the same issue but for whatever reason, the dependant > mei component on ADL loaded much sooner - so it was never an issue that was > caught but still existed on ADL time merge (if users customize the kernel + > compositor for fastboot it will happen).(i realize I havent tested ADL with the > new kernel configs that we use to also boot PXP on MTL - wonder if the new > mei configs are causing the delay - i.e. ADL customer could suddenly see this > 6 sec delay too. - something i have to check now)
On 2023-04-26 04:52:43, Daniel Vetter wrote: > > Joonas asked me to put my thoughts here: > > - in general the "feature discovery by trying it out" approach is most > robust and hence preferred, but it's also not something that's required > when there's good reasons against it More robust in what sense? Userspace has to set up some scenario to use the interface, which hopefully is not too complex. It's difficult to predict what it may look like in the future since we are talking about undefined extensions. Userspace has to rely on the kernel making creating and destroying this thing essentially free. For I915_GEM_CREATE_EXT_PROTECTED_CONTENT, that did not work out. For I915_GEM_CREATE_EXT_SET_PAT, just wondering, since the PAT indices are platform specific, what might happen if we tried to choose some common index value to detect the extension in a generic manner? Could it potentially lead to unexpected behavior, or maybe just an error. I guess it's really extension specific what kind of issues potentially could arise. > tldr; prefer discovery, don't sweat it if not, definitely don't > overengineer this with some magic "give me all extensions" thing because > that results in guaranteed cross-component backporting pains sooner or > later. Or inconsistency, which defeats the point. I guess I don't know the full context of your concerns here. For returning the gem-create extensions, isn't this just a matter of returning the valid indices to the create_extensions array in i915_gem_create.c? Is that an over-engineered query? It seems weird that there's a reasonably well defined "extension" mechanism here, but no way for the kernel to just tell us what extensions it knows about. -Jordan
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index bfe1dbda4cb7..723c3ddd6c74 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -245,6 +245,7 @@ struct create_ext { unsigned int n_placements; unsigned int placement_mask; unsigned long flags; + unsigned int pat_index; }; static void repr_placements(char *buf, size_t size, @@ -394,11 +395,39 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data return 0; } +static int ext_set_pat(struct i915_user_extension __user *base, void *data) +{ + struct create_ext *ext_data = data; + struct drm_i915_private *i915 = ext_data->i915; + struct drm_i915_gem_create_ext_set_pat ext; + unsigned int max_pat_index; + + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) != + offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd)); + + if (copy_from_user(&ext, base, sizeof(ext))) + return -EFAULT; + + max_pat_index = INTEL_INFO(i915)->max_pat_index; + + if (ext.pat_index > max_pat_index) { + drm_dbg(&i915->drm, "PAT index is invalid: %u\n", + ext.pat_index); + return -EINVAL; + } + + ext_data->pat_index = ext.pat_index; + + return 0; +} + static const i915_user_extension_fn create_extensions[] = { [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements, [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected, + [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat, }; +#define PAT_INDEX_NOT_SET 0xffff /** * i915_gem_create_ext_ioctl - Creates a new mm object and returns a handle to it. * @dev: drm device pointer @@ -418,6 +447,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) return -EINVAL; + ext_data.pat_index = PAT_INDEX_NOT_SET; ret = i915_user_extensions(u64_to_user_ptr(args->extensions), create_extensions, ARRAY_SIZE(create_extensions), @@ -454,5 +484,11 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, if (IS_ERR(obj)) return PTR_ERR(obj); + if (ext_data.pat_index != PAT_INDEX_NOT_SET) { + i915_gem_object_set_pat_index(obj, ext_data.pat_index); + /* Mark pat_index is set by UMD */ + obj->cache_level = I915_CACHE_INVAL; + } + return i915_gem_publish(obj, file, &args->size, &args->handle); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 27c948350b5b..61651f7e5806 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -209,6 +209,12 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj) if (!(obj->flags & I915_BO_ALLOC_USER)) return false; + /* + * Always flush cache for UMD objects at creation time. + */ + if (obj->cache_level == I915_CACHE_INVAL) + return true; + /* * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it * possible for userspace to bypass the GTT caching bits set by the diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index dba7c5a5b25e..03c5c314846e 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3630,9 +3630,13 @@ struct drm_i915_gem_create_ext { * * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see * struct drm_i915_gem_create_ext_protected_content. + * + * For I915_GEM_CREATE_EXT_SET_PAT usage see + * struct drm_i915_gem_create_ext_set_pat. */ #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1 +#define I915_GEM_CREATE_EXT_SET_PAT 2 __u64 extensions; }; @@ -3747,6 +3751,38 @@ struct drm_i915_gem_create_ext_protected_content { __u32 flags; }; +/** + * struct drm_i915_gem_create_ext_set_pat - The + * I915_GEM_CREATE_EXT_SET_PAT extension. + * + * If this extension is provided, the specified caching policy (PAT index) is + * applied to the buffer object. + * + * Below is an example on how to create an object with specific caching policy: + * + * .. code-block:: C + * + * struct drm_i915_gem_create_ext_set_pat set_pat_ext = { + * .base = { .name = I915_GEM_CREATE_EXT_SET_PAT }, + * .pat_index = 0, + * }; + * struct drm_i915_gem_create_ext create_ext = { + * .size = PAGE_SIZE, + * .extensions = (uintptr_t)&set_pat_ext, + * }; + * + * int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); + * if (err) ... + */ +struct drm_i915_gem_create_ext_set_pat { + /** @base: Extension link. See struct i915_user_extension. */ + struct i915_user_extension base; + /** @pat_index: PAT index to be set */ + __u32 pat_index; + /** @rsvd: reserved for future use */ + __u32 rsvd; +}; + /* ID of the protected content session managed by i915 when PXP is active */ #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf diff --git a/tools/include/uapi/drm/i915_drm.h b/tools/include/uapi/drm/i915_drm.h index 8df261c5ab9b..8cdcdb5fac26 100644 --- a/tools/include/uapi/drm/i915_drm.h +++ b/tools/include/uapi/drm/i915_drm.h @@ -3607,9 +3607,13 @@ struct drm_i915_gem_create_ext { * * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see * struct drm_i915_gem_create_ext_protected_content. + * + * For I915_GEM_CREATE_EXT_SET_PAT usage see + * struct drm_i915_gem_create_ext_set_pat. */ #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1 +#define I915_GEM_CREATE_EXT_SET_PAT 2 __u64 extensions; }; @@ -3724,6 +3728,38 @@ struct drm_i915_gem_create_ext_protected_content { __u32 flags; }; +/** + * struct drm_i915_gem_create_ext_set_pat - The + * I915_GEM_CREATE_EXT_SET_PAT extension. + * + * If this extension is provided, the specified caching policy (PAT index) is + * applied to the buffer object. + * + * Below is an example on how to create an object with specific caching policy: + * + * .. code-block:: C + * + * struct drm_i915_gem_create_ext_set_pat set_pat_ext = { + * .base = { .name = I915_GEM_CREATE_EXT_SET_PAT }, + * .pat_index = 0, + * }; + * struct drm_i915_gem_create_ext create_ext = { + * .size = PAGE_SIZE, + * .extensions = (uintptr_t)&set_pat_ext, + * }; + * + * int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); + * if (err) ... + */ +struct drm_i915_gem_create_ext_set_pat { + /** @base: Extension link. See struct i915_user_extension. */ + struct i915_user_extension base; + /** @pat_index: PAT index to be set */ + __u32 pat_index; + /** @rsvd: reserved for future use */ + __u32 rsvd; +}; + /* ID of the protected content session managed by i915 when PXP is active */ #define I915_PROTECTED_CONTENT_DEFAULT_SESSION 0xf