diff mbox series

drm/i915/gt: Continue creating engine sysfs files even after a failure

Message ID 20240819113140.325235-1-andi.shyti@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/gt: Continue creating engine sysfs files even after a failure | expand

Commit Message

Andi Shyti Aug. 19, 2024, 11:31 a.m. UTC
The i915 driver generates sysfs entries for each engine of the
GPU in /sys/class/drm/cardX/engines/.

The process is straightforward: we loop over the UABI engines and
for each one, we:

 - Create the object.
 - Create basic files.
 - If the engine supports timeslicing, create timeslice duration files.
 - If the engine supports preemption, create preemption-related files.
 - Create default value files.

Currently, if any of these steps fail, the process stops, and no
further sysfs files are created.

However, it's not necessary to stop the process on failure.
Instead, we can continue creating the remaining sysfs files for
the other engines. Even if some files fail to be created, the
list of engines can still be retrieved by querying i915.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
---
Hi,

It might make sense to create an "inv-<engine_name>" if something
goes wrong, so that the user is aware that the engine exists, but
the sysfs file is not present.

One further improvement would be to provide more information
about thei failure reason the dev_warn() message.

Andi

 drivers/gpu/drm/i915/gt/sysfs_engines.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Rodrigo Vivi Aug. 20, 2024, 9:22 p.m. UTC | #1
On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> The i915 driver generates sysfs entries for each engine of the
> GPU in /sys/class/drm/cardX/engines/.
> 
> The process is straightforward: we loop over the UABI engines and
> for each one, we:
> 
>  - Create the object.
>  - Create basic files.
>  - If the engine supports timeslicing, create timeslice duration files.
>  - If the engine supports preemption, create preemption-related files.
>  - Create default value files.
> 
> Currently, if any of these steps fail, the process stops, and no
> further sysfs files are created.
> 
> However, it's not necessary to stop the process on failure.
> Instead, we can continue creating the remaining sysfs files for
> the other engines. Even if some files fail to be created, the
> list of engines can still be retrieved by querying i915.
> 
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
> Hi,
> 
> It might make sense to create an "inv-<engine_name>" if something
> goes wrong, so that the user is aware that the engine exists, but
> the sysfs file is not present.

well, if the sysfs dir/files creation is failing, then it will
probably be unreliable anyway right?

> 
> One further improvement would be to provide more information
> about thei failure reason the dev_warn() message.

So, perhaps this patch should already go there and remove
the dev_err and add individual dev_warn for each failing path?

Also it looks something is off with the goto paths...

That if (0) is also ugly... probably better to use a
kobject_put with continue on every failing point as well...

> 
> Andi
> 
>  drivers/gpu/drm/i915/gt/sysfs_engines.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c
> index 021f51d9b456..aab2759067d2 100644
> --- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
> +++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
> @@ -530,9 +530,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
>  err_object:
>  			kobject_put(kobj);
>  err_engine:
> -			dev_err(kdev, "Failed to add sysfs engine '%s'\n",
> -				engine->name);
> -			break;
> +			dev_warn(kdev, "Failed to add sysfs engine '%s'\n",
> +				 engine->name);
>  		}
>  	}
>  }
> -- 
> 2.45.2
>
Andi Shyti Aug. 21, 2024, 7:32 a.m. UTC | #2
Hi Rodrigo,

On Tue, Aug 20, 2024 at 05:22:40PM -0400, Rodrigo Vivi wrote:
> On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> > The i915 driver generates sysfs entries for each engine of the
> > GPU in /sys/class/drm/cardX/engines/.
> > 
> > The process is straightforward: we loop over the UABI engines and
> > for each one, we:
> > 
> >  - Create the object.
> >  - Create basic files.
> >  - If the engine supports timeslicing, create timeslice duration files.
> >  - If the engine supports preemption, create preemption-related files.
> >  - Create default value files.
> > 
> > Currently, if any of these steps fail, the process stops, and no
> > further sysfs files are created.
> > 
> > However, it's not necessary to stop the process on failure.
> > Instead, we can continue creating the remaining sysfs files for
> > the other engines. Even if some files fail to be created, the
> > list of engines can still be retrieved by querying i915.
> > 
> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> > ---
> > Hi,
> > 
> > It might make sense to create an "inv-<engine_name>" if something
> > goes wrong, so that the user is aware that the engine exists, but
> > the sysfs file is not present.
> 
> well, if the sysfs dir/files creation is failing, then it will
> probably be unreliable anyway right?

Are you suggesting that "inv-<engine_name>" is OK?

> > One further improvement would be to provide more information
> > about thei failure reason the dev_warn() message.
> 
> So, perhaps this patch should already go there and remove
> the dev_err and add individual dev_warn for each failing path?

That's a suggestion, but it doesn't mean that it necessarily
improves things as it might add some innecessary information.
Just thinking.

> Also it looks something is off with the goto paths...
> 
> That if (0) is also ugly... probably better to use a
> kobject_put with continue on every failing point as well...

ehehe... I came to like it, to be honest. Besides I like single
exit paths instead of distributed returns. In this particular
case we would replcate the same "kobject_put() ... dev_warn()" in
several places, so that I'm not sure it's better.

If you like more we could do:

	for (...) {
		...
		...
		/* everything goes fine */
		continue

err_engine:
		kobject_put(...);
		dev_warn(...);
	}

And we avoid using the "if (0)" that you don't like.

Thanks,
Andi
Andi Shyti Aug. 21, 2024, 10:34 a.m. UTC | #3
Hi Again, Rodrigo,

...

> > Also it looks something is off with the goto paths...
> > 
> > That if (0) is also ugly... probably better to use a
> > kobject_put with continue on every failing point as well...
> 
> ehehe... I came to like it, to be honest. Besides I like single
> exit paths instead of distributed returns. In this particular
> case we would replcate the same "kobject_put() ... dev_warn()" in
> several places, so that I'm not sure it's better.
> 
> If you like more we could do:
> 
> 	for (...) {
> 		...
> 		...
> 		/* everything goes fine */
> 		continue
> 
> err_engine:
> 		kobject_put(...);
> 		dev_warn(...);
> 	}
> 
> And we avoid using the "if (0)" that you don't like.

BTW, the purpose of the patch is to remove the break and, as I
was at it, I chhanged dev_err/dev_warn.

Refactoring the "if (0)" is a bit out of scope. Right?

Thanks,
Andi
Rodrigo Vivi Aug. 23, 2024, 1:41 p.m. UTC | #4
On Wed, Aug 21, 2024 at 09:32:48AM +0200, Andi Shyti wrote:
> Hi Rodrigo,
> 
> On Tue, Aug 20, 2024 at 05:22:40PM -0400, Rodrigo Vivi wrote:
> > On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> > > The i915 driver generates sysfs entries for each engine of the
> > > GPU in /sys/class/drm/cardX/engines/.
> > > 
> > > The process is straightforward: we loop over the UABI engines and
> > > for each one, we:
> > > 
> > >  - Create the object.
> > >  - Create basic files.
> > >  - If the engine supports timeslicing, create timeslice duration files.
> > >  - If the engine supports preemption, create preemption-related files.
> > >  - Create default value files.
> > > 
> > > Currently, if any of these steps fail, the process stops, and no
> > > further sysfs files are created.
> > > 
> > > However, it's not necessary to stop the process on failure.
> > > Instead, we can continue creating the remaining sysfs files for
> > > the other engines. Even if some files fail to be created, the
> > > list of engines can still be retrieved by querying i915.
> > > 
> > > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> > > ---
> > > Hi,
> > > 
> > > It might make sense to create an "inv-<engine_name>" if something
> > > goes wrong, so that the user is aware that the engine exists, but
> > > the sysfs file is not present.
> > 
> > well, if the sysfs dir/files creation is failing, then it will
> > probably be unreliable anyway right?
> 
> Are you suggesting that "inv-<engine_name>" is OK?

it is okay I guess.
But my point is more on, how are we going to create this if
the creation mechanism is what is likely failing here.

> 
> > > One further improvement would be to provide more information
> > > about thei failure reason the dev_warn() message.
> > 
> > So, perhaps this patch should already go there and remove
> > the dev_err and add individual dev_warn for each failing path?
> 
> That's a suggestion, but it doesn't mean that it necessarily
> improves things as it might add some innecessary information.
> Just thinking.

okay then.

> 
> > Also it looks something is off with the goto paths...
> > 
> > That if (0) is also ugly... probably better to use a
> > kobject_put with continue on every failing point as well...
> 
> ehehe... I came to like it, to be honest. Besides I like single
> exit paths instead of distributed returns. In this particular
> case we would replcate the same "kobject_put() ... dev_warn()" in
> several places, so that I'm not sure it's better.
> 
> If you like more we could do:
> 
> 	for (...) {
> 		...
> 		...
> 		/* everything goes fine */
> 		continue
> 
> err_engine:
> 		kobject_put(...);
> 		dev_warn(...);
> 	}
> 
> And we avoid using the "if (0)" that you don't like.

nah, no strong feeling from my side. It is there, let's
avoid unnecessary refactors.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

on this patch as is. And sorry for the delay.

> 
> Thanks,
> Andi
Andi Shyti Aug. 23, 2024, 10:26 p.m. UTC | #5
Hi Rodrigo,

On Fri, Aug 23, 2024 at 09:41:31AM -0400, Rodrigo Vivi wrote:
> On Wed, Aug 21, 2024 at 09:32:48AM +0200, Andi Shyti wrote:
> > On Tue, Aug 20, 2024 at 05:22:40PM -0400, Rodrigo Vivi wrote:
> > > On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
...
> > > > It might make sense to create an "inv-<engine_name>" if something
> > > > goes wrong, so that the user is aware that the engine exists, but
> > > > the sysfs file is not present.
> > > 
> > > well, if the sysfs dir/files creation is failing, then it will
> > > probably be unreliable anyway right?
> > 
> > Are you suggesting that "inv-<engine_name>" is OK?
> 
> it is okay I guess.
> But my point is more on, how are we going to create this if
> the creation mechanism is what is likely failing here.

We can fail for different reasons... but yeah you are right, it
doesn't make much sense, as also the creation of "inv-<...>"
interfaces might be unreliable.

> > > Also it looks something is off with the goto paths...
> > > 
> > > That if (0) is also ugly... probably better to use a
> > > kobject_put with continue on every failing point as well...
> > 
> > ehehe... I came to like it, to be honest. Besides I like single
> > exit paths instead of distributed returns. In this particular
> > case we would replcate the same "kobject_put() ... dev_warn()" in
> > several places, so that I'm not sure it's better.
> > 
> > If you like more we could do:
> > 
> > 	for (...) {
> > 		...
> > 		...
> > 		/* everything goes fine */
> > 		continue
> > 
> > err_engine:
> > 		kobject_put(...);
> > 		dev_warn(...);
> > 	}
> > 
> > And we avoid using the "if (0)" that you don't like.
> 
> nah, no strong feeling from my side. It is there, let's
> avoid unnecessary refactors.
> 
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> 
> on this patch as is. And sorry for the delay.

Thanks a lot for your review :-)

Andi
Daniel Vetter Aug. 27, 2024, 5:05 p.m. UTC | #6
On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> The i915 driver generates sysfs entries for each engine of the
> GPU in /sys/class/drm/cardX/engines/.
> 
> The process is straightforward: we loop over the UABI engines and
> for each one, we:
> 
>  - Create the object.
>  - Create basic files.
>  - If the engine supports timeslicing, create timeslice duration files.
>  - If the engine supports preemption, create preemption-related files.
>  - Create default value files.
> 
> Currently, if any of these steps fail, the process stops, and no
> further sysfs files are created.
> 
> However, it's not necessary to stop the process on failure.
> Instead, we can continue creating the remaining sysfs files for
> the other engines. Even if some files fail to be created, the
> list of engines can still be retrieved by querying i915.
> 
> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>

Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
needed, and we should delete those files probably.

This is different from debugfs, where failures are consistently ignored
because that's the conscious design choice Greg made and wants supported.
Because debugfs is optional.

So please make sure we correctly fail driver load if these don't register.
Even better would be if sysfs files are registered atomically as attribute
blocks, but that's an entire different can of worms. But that would really
clean up this code and essentially put any failure handling onto core
driver model and sysfs code.
-Sima

> ---
> Hi,
> 
> It might make sense to create an "inv-<engine_name>" if something
> goes wrong, so that the user is aware that the engine exists, but
> the sysfs file is not present.
> 
> One further improvement would be to provide more information
> about thei failure reason the dev_warn() message.
> 
> Andi
> 
>  drivers/gpu/drm/i915/gt/sysfs_engines.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c
> index 021f51d9b456..aab2759067d2 100644
> --- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
> +++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
> @@ -530,9 +530,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
>  err_object:
>  			kobject_put(kobj);
>  err_engine:
> -			dev_err(kdev, "Failed to add sysfs engine '%s'\n",
> -				engine->name);
> -			break;
> +			dev_warn(kdev, "Failed to add sysfs engine '%s'\n",
> +				 engine->name);
>  		}
>  	}
>  }
> -- 
> 2.45.2
>
Rodrigo Vivi Aug. 28, 2024, 8:17 p.m. UTC | #7
On Tue, Aug 27, 2024 at 07:05:05PM +0200, Daniel Vetter wrote:
> On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> > The i915 driver generates sysfs entries for each engine of the
> > GPU in /sys/class/drm/cardX/engines/.
> > 
> > The process is straightforward: we loop over the UABI engines and
> > for each one, we:
> > 
> >  - Create the object.
> >  - Create basic files.
> >  - If the engine supports timeslicing, create timeslice duration files.
> >  - If the engine supports preemption, create preemption-related files.
> >  - Create default value files.
> > 
> > Currently, if any of these steps fail, the process stops, and no
> > further sysfs files are created.
> > 
> > However, it's not necessary to stop the process on failure.
> > Instead, we can continue creating the remaining sysfs files for
> > the other engines. Even if some files fail to be created, the
> > list of engines can still be retrieved by querying i915.
> > 
> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> 
> Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
> needed, and we should delete those files probably.
> 
> This is different from debugfs, where failures are consistently ignored
> because that's the conscious design choice Greg made and wants supported.
> Because debugfs is optional.
> 
> So please make sure we correctly fail driver load if these don't register.
> Even better would be if sysfs files are registered atomically as attribute
> blocks, but that's an entire different can of worms. But that would really
> clean up this code and essentially put any failure handling onto core
> driver model and sysfs code.

Indeed very good point. Sorry for having missed this perspective.

> -Sima
> 
> > ---
> > Hi,
> > 
> > It might make sense to create an "inv-<engine_name>" if something
> > goes wrong, so that the user is aware that the engine exists, but
> > the sysfs file is not present.
> > 
> > One further improvement would be to provide more information
> > about thei failure reason the dev_warn() message.
> > 
> > Andi
> > 
> >  drivers/gpu/drm/i915/gt/sysfs_engines.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c
> > index 021f51d9b456..aab2759067d2 100644
> > --- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
> > +++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
> > @@ -530,9 +530,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
> >  err_object:
> >  			kobject_put(kobj);
> >  err_engine:
> > -			dev_err(kdev, "Failed to add sysfs engine '%s'\n",
> > -				engine->name);
> > -			break;
> > +			dev_warn(kdev, "Failed to add sysfs engine '%s'\n",
> > +				 engine->name);
> >  		}
> >  	}
> >  }
> > -- 
> > 2.45.2
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Andi Shyti Sept. 4, 2024, 1:20 p.m. UTC | #8
Hi Sima,

On Tue, Aug 27, 2024 at 07:05:05PM +0200, Daniel Vetter wrote:
> On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> > The i915 driver generates sysfs entries for each engine of the
> > GPU in /sys/class/drm/cardX/engines/.
> > 
> > The process is straightforward: we loop over the UABI engines and
> > for each one, we:
> > 
> >  - Create the object.
> >  - Create basic files.
> >  - If the engine supports timeslicing, create timeslice duration files.
> >  - If the engine supports preemption, create preemption-related files.
> >  - Create default value files.
> > 
> > Currently, if any of these steps fail, the process stops, and no
> > further sysfs files are created.
> > 
> > However, it's not necessary to stop the process on failure.
> > Instead, we can continue creating the remaining sysfs files for
> > the other engines. Even if some files fail to be created, the
> > list of engines can still be retrieved by querying i915.
> > 
> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> 
> Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
> needed, and we should delete those files probably.
> 
> This is different from debugfs, where failures are consistently ignored
> because that's the conscious design choice Greg made and wants supported.
> Because debugfs is optional.
> 
> So please make sure we correctly fail driver load if these don't register.
> Even better would be if sysfs files are registered atomically as attribute
> blocks, but that's an entire different can of worms. But that would really
> clean up this code and essentially put any failure handling onto core
> driver model and sysfs code.

This comment came after I merged the patch. So far, we have been
keeping the driver going even if sysfs fails to create, with the
idea of "if there is something wrong let it go as far as it can
and fail on its own".

This change is just setting the behavior to what the rest of the
interfaces are doing, so that either we change them all to fail
the driver's probe or we have them behaving consistently as they
are.

Tvrtko, Chris, Rodrigo any opinion from your side? Shall we bail
out as Sima is suggesting?

Thanks,
Andi
Jani Nikula Sept. 4, 2024, 2:34 p.m. UTC | #9
On Wed, 04 Sep 2024, Andi Shyti <andi.shyti@linux.intel.com> wrote:
> Hi Sima,
>
> On Tue, Aug 27, 2024 at 07:05:05PM +0200, Daniel Vetter wrote:
>> On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
>> > The i915 driver generates sysfs entries for each engine of the
>> > GPU in /sys/class/drm/cardX/engines/.
>> > 
>> > The process is straightforward: we loop over the UABI engines and
>> > for each one, we:
>> > 
>> >  - Create the object.
>> >  - Create basic files.
>> >  - If the engine supports timeslicing, create timeslice duration files.
>> >  - If the engine supports preemption, create preemption-related files.
>> >  - Create default value files.
>> > 
>> > Currently, if any of these steps fail, the process stops, and no
>> > further sysfs files are created.
>> > 
>> > However, it's not necessary to stop the process on failure.
>> > Instead, we can continue creating the remaining sysfs files for
>> > the other engines. Even if some files fail to be created, the
>> > list of engines can still be retrieved by querying i915.
>> > 
>> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>> 
>> Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
>> needed, and we should delete those files probably.
>> 
>> This is different from debugfs, where failures are consistently ignored
>> because that's the conscious design choice Greg made and wants supported.
>> Because debugfs is optional.
>> 
>> So please make sure we correctly fail driver load if these don't register.
>> Even better would be if sysfs files are registered atomically as attribute
>> blocks, but that's an entire different can of worms. But that would really
>> clean up this code and essentially put any failure handling onto core
>> driver model and sysfs code.
>
> This comment came after I merged the patch. So far, we have been
> keeping the driver going even if sysfs fails to create, with the
> idea of "if there is something wrong let it go as far as it can
> and fail on its own".
>
> This change is just setting the behavior to what the rest of the
> interfaces are doing, so that either we change them all to fail
> the driver's probe or we have them behaving consistently as they
> are.
>
> Tvrtko, Chris, Rodrigo any opinion from your side? Shall we bail
> out as Sima is suggesting?

Are there any causes for sysfs creation errors that would be acceptable
to ignore? I didn't see any examples. Or is this just speculative?

IMO fail fast and loud. We get enough bug reports where there's some big
backtrace splash copy-pasted on the bug, but the root cause happened
much earlier and was ignored.

BR,
Jani.
Tvrtko Ursulin Sept. 4, 2024, 3:08 p.m. UTC | #10
On 04/09/2024 15:34, Jani Nikula wrote:
> On Wed, 04 Sep 2024, Andi Shyti <andi.shyti@linux.intel.com> wrote:
>> Hi Sima,
>>
>> On Tue, Aug 27, 2024 at 07:05:05PM +0200, Daniel Vetter wrote:
>>> On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
>>>> The i915 driver generates sysfs entries for each engine of the
>>>> GPU in /sys/class/drm/cardX/engines/.
>>>>
>>>> The process is straightforward: we loop over the UABI engines and
>>>> for each one, we:
>>>>
>>>>   - Create the object.
>>>>   - Create basic files.
>>>>   - If the engine supports timeslicing, create timeslice duration files.
>>>>   - If the engine supports preemption, create preemption-related files.
>>>>   - Create default value files.
>>>>
>>>> Currently, if any of these steps fail, the process stops, and no
>>>> further sysfs files are created.
>>>>
>>>> However, it's not necessary to stop the process on failure.
>>>> Instead, we can continue creating the remaining sysfs files for
>>>> the other engines. Even if some files fail to be created, the
>>>> list of engines can still be retrieved by querying i915.
>>>>
>>>> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>
>>> Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
>>> needed, and we should delete those files probably.
>>>
>>> This is different from debugfs, where failures are consistently ignored
>>> because that's the conscious design choice Greg made and wants supported.
>>> Because debugfs is optional.
>>>
>>> So please make sure we correctly fail driver load if these don't register.
>>> Even better would be if sysfs files are registered atomically as attribute
>>> blocks, but that's an entire different can of worms. But that would really
>>> clean up this code and essentially put any failure handling onto core
>>> driver model and sysfs code.
>>
>> This comment came after I merged the patch. So far, we have been
>> keeping the driver going even if sysfs fails to create, with the
>> idea of "if there is something wrong let it go as far as it can
>> and fail on its own".
>>
>> This change is just setting the behavior to what the rest of the
>> interfaces are doing, so that either we change them all to fail
>> the driver's probe or we have them behaving consistently as they
>> are.
>>
>> Tvrtko, Chris, Rodrigo any opinion from your side? Shall we bail
>> out as Sima is suggesting?
> 
> Are there any causes for sysfs creation errors that would be acceptable
> to ignore? I didn't see any examples. Or is this just speculative?

I think it is speculative and that the reason for "carry on on failure" 
was probably simply because there aren't any real world reasons any 
would ever fail. Either a programming error or kernel out of memory on 
driver load, and neither of those sounds interesting.

I suspect historically it was probably deemed simpler not to bother with 
any unwind or such, and that is the only reason i915_setup_sysfs() 
returns void.

In this context I don't see a big ROI in making someone work on 
implementing a driver load abort here, but also don't think it would harm.

IMO it would be fine to tie the decision with the fate of dynamic CCS 
engines. If that will go in then it definitely more than makes sense to 
propagate all errors to the entity doing the sysfs write.

Regards,

Tvrtko

> IMO fail fast and loud. We get enough bug reports where there's some big
> backtrace splash copy-pasted on the bug, but the root cause happened
> much earlier and was ignored.
> 
> BR,
> Jani.
> 
>
Rodrigo Vivi Sept. 4, 2024, 3:13 p.m. UTC | #11
On Wed, Sep 04, 2024 at 05:34:42PM +0300, Jani Nikula wrote:
> On Wed, 04 Sep 2024, Andi Shyti <andi.shyti@linux.intel.com> wrote:
> > Hi Sima,
> >
> > On Tue, Aug 27, 2024 at 07:05:05PM +0200, Daniel Vetter wrote:
> >> On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> >> > The i915 driver generates sysfs entries for each engine of the
> >> > GPU in /sys/class/drm/cardX/engines/.
> >> > 
> >> > The process is straightforward: we loop over the UABI engines and
> >> > for each one, we:
> >> > 
> >> >  - Create the object.
> >> >  - Create basic files.
> >> >  - If the engine supports timeslicing, create timeslice duration files.
> >> >  - If the engine supports preemption, create preemption-related files.
> >> >  - Create default value files.
> >> > 
> >> > Currently, if any of these steps fail, the process stops, and no
> >> > further sysfs files are created.
> >> > 
> >> > However, it's not necessary to stop the process on failure.
> >> > Instead, we can continue creating the remaining sysfs files for
> >> > the other engines. Even if some files fail to be created, the
> >> > list of engines can still be retrieved by querying i915.
> >> > 
> >> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
> >> 
> >> Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
> >> needed, and we should delete those files probably.
> >> 
> >> This is different from debugfs, where failures are consistently ignored
> >> because that's the conscious design choice Greg made and wants supported.
> >> Because debugfs is optional.
> >> 
> >> So please make sure we correctly fail driver load if these don't register.
> >> Even better would be if sysfs files are registered atomically as attribute
> >> blocks, but that's an entire different can of worms. But that would really
> >> clean up this code and essentially put any failure handling onto core
> >> driver model and sysfs code.
> >
> > This comment came after I merged the patch. So far, we have been
> > keeping the driver going even if sysfs fails to create, with the
> > idea of "if there is something wrong let it go as far as it can
> > and fail on its own".
> >
> > This change is just setting the behavior to what the rest of the
> > interfaces are doing, so that either we change them all to fail
> > the driver's probe or we have them behaving consistently as they
> > are.
> >
> > Tvrtko, Chris, Rodrigo any opinion from your side? Shall we bail
> > out as Sima is suggesting?

I agree with Sima. I'm sorry for not having thought about that
perspective sooner.

> 
> Are there any causes for sysfs creation errors that would be acceptable
> to ignore? I didn't see any examples. Or is this just speculative?

Sima's point is, if the driver can live on this speculative scenario,
without the sysfs uapi. It is because it is likely a bogus unneded
uapi. So, why should we?

I know, if we take all the frequency control in the sysfs. The driver
can be fully functional and we would opperate in our variable frequency
normally and admin doesn't necessarily need to know or to tweak the
frequency. From this perspective, I accepted Andi's patch.

And this isn't a bogus interface. However, if the sysfs failed to
be created and suddently the admin depends on that to inspect
for a performance case or something and the sysfs is not there.

When he runs his tools everything will fail and this is a bad regression
for that case which we shouldn't had caused/accepted to begin with.

So, in this speculative case it is better to stop probing the driver
if we are not able to create the sysfs.

> 
> IMO fail fast and loud. We get enough bug reports where there's some big
> backtrace splash copy-pasted on the bug, but the root cause happened
> much earlier and was ignored.
> 
> BR,
> Jani.
> 
> 
> -- 
> Jani Nikula, Intel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index 021f51d9b456..aab2759067d2 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -530,9 +530,8 @@  void intel_engines_add_sysfs(struct drm_i915_private *i915)
 err_object:
 			kobject_put(kobj);
 err_engine:
-			dev_err(kdev, "Failed to add sysfs engine '%s'\n",
-				engine->name);
-			break;
+			dev_warn(kdev, "Failed to add sysfs engine '%s'\n",
+				 engine->name);
 		}
 	}
 }