mbox series

[v2,0/6] Add HWMON support for DGFX

Message ID 20230627183043.2024530-1-badal.nilawar@intel.com (mailing list archive)
Headers show
Series Add HWMON support for DGFX | expand

Message

Nilawar, Badal June 27, 2023, 6:30 p.m. UTC
This series adds the hwmon support on xe driver for 
DGFX

Badal Nilawar (6):
  drm/xe/hwmon: Add HWMON infrastructure
  drm/xe/hwmon: Expose power attributes
  drm/xe/hwmon: Expose card reactive critical power
  drm/xe/hwmon: Expose input voltage attribute
  drm/xe/hwmon: Expose hwmon energy attribute
  drm/xe/hwmon: Expose power1_max_interval

 .../ABI/testing/sysfs-driver-intel-xe-hwmon   |  77 ++
 drivers/gpu/drm/xe/Makefile                   |   3 +
 drivers/gpu/drm/xe/regs/xe_gt_regs.h          |   9 +
 drivers/gpu/drm/xe/regs/xe_mchbar_regs.h      |  45 +
 drivers/gpu/drm/xe/xe_device.c                |   5 +
 drivers/gpu/drm/xe/xe_device_types.h          |   2 +
 drivers/gpu/drm/xe/xe_hwmon.c                 | 989 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_hwmon.h                 |  26 +
 drivers/gpu/drm/xe/xe_pcode.h                 |   5 +
 drivers/gpu/drm/xe/xe_pcode_api.h             |   7 +
 drivers/gpu/drm/xe/xe_uc.c                    |   6 +
 11 files changed, 1174 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
 create mode 100644 drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_hwmon.c
 create mode 100644 drivers/gpu/drm/xe/xe_hwmon.h

Comments

Dixit, Ashutosh July 2, 2023, 1:31 a.m. UTC | #1
On Tue, 27 Jun 2023 11:30:37 -0700, Badal Nilawar wrote:
>

Hi Badal,

> This series adds the hwmon support on xe driver for DGFX

Needs some discussion but I have a general comment on this series
first. The implementation here follow what was done for i915. But how
"hwmon attributes are defined" I think we should look at how this was done
in other drm drivers, namely amdgpu and radeon. Look here (search for
"hwmon_attributes"):

drivers/gpu/drm/amd/pm/amdgpu_pm.c, and
drivers/gpu/drm/radeon/radeon_pm.c

Here the hwmon attribute definition is very similar to how general sysfs
attributes are defined (they will just appear in hwmon directories) and
does not carry baggage of the hwmon infrastructure (what i915 has). So my
preference is to shift to this amd/radeon way for xe.

There is also a separate discussion on whether to use hwmon sysfs for other
custom attributes, as has been done in these other drm drivers, and using
this light-weight method should help if we went this route too.

Thanks.
--
Ashutosh
Guenter Roeck July 2, 2023, 3:02 a.m. UTC | #2
On 7/1/23 18:31, Dixit, Ashutosh wrote:
> On Tue, 27 Jun 2023 11:30:37 -0700, Badal Nilawar wrote:
>>
> 
> Hi Badal,
> 
>> This series adds the hwmon support on xe driver for DGFX
> 
> Needs some discussion but I have a general comment on this series
> first. The implementation here follow what was done for i915. But how
> "hwmon attributes are defined" I think we should look at how this was done
> in other drm drivers, namely amdgpu and radeon. Look here (search for
> "hwmon_attributes"):
> 
> drivers/gpu/drm/amd/pm/amdgpu_pm.c, and
> drivers/gpu/drm/radeon/radeon_pm.c
> 
> Here the hwmon attribute definition is very similar to how general sysfs
> attributes are defined (they will just appear in hwmon directories) and
> does not carry baggage of the hwmon infrastructure (what i915 has). So my
> preference is to shift to this amd/radeon way for xe.
> 

You mean your preference is to use a deprecated hardware monitoring
registration function and to explicitly violate the following statement
from Documentation/hwmon/hwmon-kernel-api.rst ?

   All other hardware monitoring device registration functions are deprecated
   and must not be used in new drivers.

That is quite interesting. Please elaborate and explain your rationale.

Thanks,
Guenter
Dixit, Ashutosh July 2, 2023, 3:57 p.m. UTC | #3
On Sat, 01 Jul 2023 20:02:51 -0700, Guenter Roeck wrote:
>
> On 7/1/23 18:31, Dixit, Ashutosh wrote:
> > On Tue, 27 Jun 2023 11:30:37 -0700, Badal Nilawar wrote:
> >>
> >
> > Hi Badal,
> >
> >> This series adds the hwmon support on xe driver for DGFX
> >
> > Needs some discussion but I have a general comment on this series
> > first. The implementation here follow what was done for i915. But how
> > "hwmon attributes are defined" I think we should look at how this was done
> > in other drm drivers, namely amdgpu and radeon. Look here (search for
> > "hwmon_attributes"):
> >
> > drivers/gpu/drm/amd/pm/amdgpu_pm.c, and
> > drivers/gpu/drm/radeon/radeon_pm.c
> >
> > Here the hwmon attribute definition is very similar to how general sysfs
> > attributes are defined (they will just appear in hwmon directories) and
> > does not carry baggage of the hwmon infrastructure (what i915 has). So my
> > preference is to shift to this amd/radeon way for xe.
> >
>
> You mean your preference is to use a deprecated hardware monitoring
> registration function and to explicitly violate the following statement
> from Documentation/hwmon/hwmon-kernel-api.rst ?
>
>   All other hardware monitoring device registration functions are deprecated
>   and must not be used in new drivers.

I missed that, but since we also have this in ddaefa209c4a ("hwmon: Make
chip parameter for with_info API mandatory"), yes that is what it would
boil down to.

> That is quite interesting. Please elaborate and explain your rationale.

Basically, like those other drm drivers, the chip parameter is of no use to
us (or at least we'd be totally fine not using it), hence the desire to
skip it.

But we are still required to use what we don't need? Do you care about
drivers outside drivers/hwmon?

Thanks.
--
Ashutosh
Guenter Roeck July 2, 2023, 5:01 p.m. UTC | #4
On 7/2/23 08:57, Dixit, Ashutosh wrote:
> On Sat, 01 Jul 2023 20:02:51 -0700, Guenter Roeck wrote:
>>
>> On 7/1/23 18:31, Dixit, Ashutosh wrote:
>>> On Tue, 27 Jun 2023 11:30:37 -0700, Badal Nilawar wrote:
>>>>
>>>
>>> Hi Badal,
>>>
>>>> This series adds the hwmon support on xe driver for DGFX
>>>
>>> Needs some discussion but I have a general comment on this series
>>> first. The implementation here follow what was done for i915. But how
>>> "hwmon attributes are defined" I think we should look at how this was done
>>> in other drm drivers, namely amdgpu and radeon. Look here (search for
>>> "hwmon_attributes"):
>>>
>>> drivers/gpu/drm/amd/pm/amdgpu_pm.c, and
>>> drivers/gpu/drm/radeon/radeon_pm.c
>>>
>>> Here the hwmon attribute definition is very similar to how general sysfs
>>> attributes are defined (they will just appear in hwmon directories) and
>>> does not carry baggage of the hwmon infrastructure (what i915 has). So my
>>> preference is to shift to this amd/radeon way for xe.
>>>
>>
>> You mean your preference is to use a deprecated hardware monitoring
>> registration function and to explicitly violate the following statement
>> from Documentation/hwmon/hwmon-kernel-api.rst ?
>>
>>    All other hardware monitoring device registration functions are deprecated
>>    and must not be used in new drivers.
> 
> I missed that, but since we also have this in ddaefa209c4a ("hwmon: Make
> chip parameter for with_info API mandatory"), yes that is what it would
> boil down to.
> 

The chip parameter covers all standard hwmon sysfs attributes. A hwmon driver
without standard sysfs attributes is not a hwmon driver. It abuses the hwmon
subsystem and its API/ABI. If I catch such a driver, I'll NACK it. If I find
one in the kernel, I will do my best to get it removed.

>> That is quite interesting. Please elaborate and explain your rationale.
> 
> Basically, like those other drm drivers, the chip parameter is of no use to
> us (or at least we'd be totally fine not using it), hence the desire to
> skip it.
> 
> But we are still required to use what we don't need? Do you care about
> drivers outside drivers/hwmon?
> 

I would suggest to read the hwmon API more closely to understand it. Your claim
that "the chip parameter is of no use to us" is simply wrong, as should be obvious
when you read this submission. Actually, if you would convert the other
drm drivers to use it, it would reduce the size of the hwmon specific code
in those drivers, typically by 20-40%. Given that, I must admit that I am quite
baffled by your claim. Maybe you could explain that in more detail.

Of course, I care about use of the hardware monitoring subsystem
outside drivers/hwmon. Unlike other maintainers, I let people register drivers
from outside that directory, but that doesn't mean that I don't care.

FWIW, you are close to convincing me to add a warning message to the kernel
to tell users of deprecated hwmon APIs that the API is deprecated.
Alternatively, I may stop permitting new hwmon drivers outside drivers/hwmon.

Guenter

> Thanks.
> --
> Ashutosh
Dixit, Ashutosh July 2, 2023, 8:29 p.m. UTC | #5
On Sun, 02 Jul 2023 10:01:00 -0700, Guenter Roeck wrote:
>
> On 7/2/23 08:57, Dixit, Ashutosh wrote:
> > On Sat, 01 Jul 2023 20:02:51 -0700, Guenter Roeck wrote:
> >>
> >> On 7/1/23 18:31, Dixit, Ashutosh wrote:
> >>> On Tue, 27 Jun 2023 11:30:37 -0700, Badal Nilawar wrote:
> >>>>
> >>>
> >>> Hi Badal,
> >>>
> >>>> This series adds the hwmon support on xe driver for DGFX
> >>>
> >>> Needs some discussion but I have a general comment on this series
> >>> first. The implementation here follow what was done for i915. But how
> >>> "hwmon attributes are defined" I think we should look at how this was done
> >>> in other drm drivers, namely amdgpu and radeon. Look here (search for
> >>> "hwmon_attributes"):
> >>>
> >>> drivers/gpu/drm/amd/pm/amdgpu_pm.c, and
> >>> drivers/gpu/drm/radeon/radeon_pm.c
> >>>
> >>> Here the hwmon attribute definition is very similar to how general sysfs
> >>> attributes are defined (they will just appear in hwmon directories) and
> >>> does not carry baggage of the hwmon infrastructure (what i915 has). So my
> >>> preference is to shift to this amd/radeon way for xe.
> >>>
> >>
> >> You mean your preference is to use a deprecated hardware monitoring
> >> registration function and to explicitly violate the following statement
> >> from Documentation/hwmon/hwmon-kernel-api.rst ?
> >>
> >>    All other hardware monitoring device registration functions are deprecated
> >>    and must not be used in new drivers.
> >
> > I missed that, but since we also have this in ddaefa209c4a ("hwmon: Make
> > chip parameter for with_info API mandatory"), yes that is what it would
> > boil down to.
> >
>
> The chip parameter covers all standard hwmon sysfs attributes. A hwmon driver
> without standard sysfs attributes is not a hwmon driver. It abuses the hwmon
> subsystem and its API/ABI.

To me, hwmon is a means to expose some standard attributes to standard
userspace apps so that those apps can find those attributes. What kernel
API's are used internally is an internal matter of the kernel. As subsytem
maintainer you may have reasons for allowing only certain API's.

> If I catch such a driver, I'll NACK it. If I find one in the kernel, I
> will do my best to get it removed.
>
> >> That is quite interesting. Please elaborate and explain your rationale.
> >
> > Basically, like those other drm drivers, the chip parameter is of no use to
> > us (or at least we'd be totally fine not using it), hence the desire to
> > skip it.
> >
> > But we are still required to use what we don't need? Do you care about
> > drivers outside drivers/hwmon?
> >
>
> I would suggest to read the hwmon API more closely to understand it. Your claim
> that "the chip parameter is of no use to us" is simply wrong, as should be obvious
> when you read this submission. Actually, if you would convert the other
> drm drivers to use it, it would reduce the size of the hwmon specific code
> in those drivers, typically by 20-40%. Given that, I must admit that I am quite
> baffled by your claim. Maybe you could explain that in more detail.

Of course when the chip parameter helps it likely reduces code. But when it
is not needed it adds unnecessary code. Those drm drivers
(amdgpu/radeon/i915) I mentioned above are available in the kernel, anyone
can see and judge for themselves.

Of course people might have been abusing the deprecated API's (or NULL chip
parameter) but to me it seems there is also some legitimate use for them.

> Of course, I care about use of the hardware monitoring subsystem
> outside drivers/hwmon. Unlike other maintainers, I let people register drivers
> from outside that directory, but that doesn't mean that I don't care.
>
> FWIW, you are close to convincing me to add a warning message to the kernel
> to tell users of deprecated hwmon APIs that the API is deprecated.
> Alternatively, I may stop permitting new hwmon drivers outside drivers/hwmon.

OK, thanks for clarifying, since you disagree we will not use deprecated
API's, so we will continue with the approach taken in this series.

Ashutosh
Guenter Roeck July 2, 2023, 8:51 p.m. UTC | #6
On 7/2/23 13:29, Dixit, Ashutosh wrote:

> Of course people might have been abusing the deprecated API's (or NULL chip
> parameter) but to me it seems there is also some legitimate use for them.
> 

You still neglect to explain what you think that legitimate use would be.

Guenter
Dixit, Ashutosh July 3, 2023, 1:48 a.m. UTC | #7
On Sun, 02 Jul 2023 13:51:40 -0700, Guenter Roeck wrote:
>
> On 7/2/23 13:29, Dixit, Ashutosh wrote:
>
> > Of course people might have been abusing the deprecated API's (or NULL chip
> > parameter) but to me it seems there is also some legitimate use for them.
> >
>
> You still neglect to explain what you think that legitimate use would be.

To me "drivers/gpu/drm/amd/pm/amdgpu_pm.c" is a legitimate use case which
doesn't use chip_info (both standard and custom hwmon attributes are
defined without using chip_info). "drivers/gpu/drm/i915/i915_hwmon.c" has
all this extra code related to chip_info/channel_info which is not
needed. i915 could well move to the amdgpu model and that would reduce i915
code. That is what I was originally proposing for this new patch series.

Ashutosh
Guenter Roeck July 3, 2023, 2:37 a.m. UTC | #8
On 7/2/23 18:48, Dixit, Ashutosh wrote:
> On Sun, 02 Jul 2023 13:51:40 -0700, Guenter Roeck wrote:
>>
>> On 7/2/23 13:29, Dixit, Ashutosh wrote:
>>
>>> Of course people might have been abusing the deprecated API's (or NULL chip
>>> parameter) but to me it seems there is also some legitimate use for them.
>>>
>>
>> You still neglect to explain what you think that legitimate use would be.
> 
> To me "drivers/gpu/drm/amd/pm/amdgpu_pm.c" is a legitimate use case which
> doesn't use chip_info (both standard and custom hwmon attributes are
> defined without using chip_info). "drivers/gpu/drm/i915/i915_hwmon.c" has

In new code, standard hwmon attributes MUST be defined using chip_info.
Declaring the use of a deprecated API as "legitimate use case" and use it
as example for new code is never appropriate.

> all this extra code related to chip_info/channel_info which is not
> needed. i915 could well move to the amdgpu model and that would reduce i915

Yes, and the proposed i915 code _doesn't_ have all the extra code that would
otherwise be needed to generate and read/write sysfs attributes directly.

> code. That is what I was originally proposing for this new patch series.
> 

This is wrong. Using chip_info _always_ reduces code size for standard
hwmon attributes, because the code can concentrate on reading and
writing values from/to the chip and doesn't have to bother with sysfs
attribute handling. Convert drivers/gpu/drm/amd/pm/amdgpu_pm.c to use
the with_info API and you'll see.

Guenter
Andi Shyti July 3, 2023, 8:55 a.m. UTC | #9
Hi,

On Sat, Jul 01, 2023 at 08:02:51PM -0700, Guenter Roeck wrote:
> On 7/1/23 18:31, Dixit, Ashutosh wrote:
> > On Tue, 27 Jun 2023 11:30:37 -0700, Badal Nilawar wrote:
> > > 
> > 
> > Hi Badal,
> > 
> > > This series adds the hwmon support on xe driver for DGFX
> > 
> > Needs some discussion but I have a general comment on this series
> > first. The implementation here follow what was done for i915. But how
> > "hwmon attributes are defined" I think we should look at how this was done
> > in other drm drivers, namely amdgpu and radeon. Look here (search for
> > "hwmon_attributes"):
> > 
> > drivers/gpu/drm/amd/pm/amdgpu_pm.c, and
> > drivers/gpu/drm/radeon/radeon_pm.c
> > 
> > Here the hwmon attribute definition is very similar to how general sysfs
> > attributes are defined (they will just appear in hwmon directories) and
> > does not carry baggage of the hwmon infrastructure (what i915 has). So my
> > preference is to shift to this amd/radeon way for xe.
> > 
> 
> You mean your preference is to use a deprecated hardware monitoring
> registration function and to explicitly violate the following statement
> from Documentation/hwmon/hwmon-kernel-api.rst ?
> 
>   All other hardware monitoring device registration functions are deprecated
>   and must not be used in new drivers.
> 
> That is quite interesting. Please elaborate and explain your rationale.

how about using iio instead of hwmon?

Andi
Rodrigo Vivi July 14, 2023, 8:21 p.m. UTC | #10
On Sun, Jul 02, 2023 at 10:01:00AM -0700, Guenter Roeck wrote:
> On 7/2/23 08:57, Dixit, Ashutosh wrote:
> > On Sat, 01 Jul 2023 20:02:51 -0700, Guenter Roeck wrote:
> > > 
> > > On 7/1/23 18:31, Dixit, Ashutosh wrote:
> > > > On Tue, 27 Jun 2023 11:30:37 -0700, Badal Nilawar wrote:
> > > > > 
> > > > 
> > > > Hi Badal,
> > > > 
> > > > > This series adds the hwmon support on xe driver for DGFX
> > > > 
> > > > Needs some discussion but I have a general comment on this series
> > > > first. The implementation here follow what was done for i915. But how
> > > > "hwmon attributes are defined" I think we should look at how this was done
> > > > in other drm drivers, namely amdgpu and radeon. Look here (search for
> > > > "hwmon_attributes"):
> > > > 
> > > > drivers/gpu/drm/amd/pm/amdgpu_pm.c, and
> > > > drivers/gpu/drm/radeon/radeon_pm.c
> > > > 
> > > > Here the hwmon attribute definition is very similar to how general sysfs
> > > > attributes are defined (they will just appear in hwmon directories) and
> > > > does not carry baggage of the hwmon infrastructure (what i915 has). So my
> > > > preference is to shift to this amd/radeon way for xe.
> > > > 
> > > 
> > > You mean your preference is to use a deprecated hardware monitoring
> > > registration function and to explicitly violate the following statement
> > > from Documentation/hwmon/hwmon-kernel-api.rst ?
> > > 
> > >    All other hardware monitoring device registration functions are deprecated
> > >    and must not be used in new drivers.
> > 
> > I missed that, but since we also have this in ddaefa209c4a ("hwmon: Make
> > chip parameter for with_info API mandatory"), yes that is what it would
> > boil down to.
> > 
> 
> The chip parameter covers all standard hwmon sysfs attributes. A hwmon driver
> without standard sysfs attributes is not a hwmon driver. It abuses the hwmon
> subsystem and its API/ABI. If I catch such a driver, I'll NACK it. If I find
> one in the kernel, I will do my best to get it removed.
> 
> > > That is quite interesting. Please elaborate and explain your rationale.
> > 
> > Basically, like those other drm drivers, the chip parameter is of no use to
> > us (or at least we'd be totally fine not using it), hence the desire to
> > skip it.
> > 
> > But we are still required to use what we don't need? Do you care about
> > drivers outside drivers/hwmon?
> > 
> 
> I would suggest to read the hwmon API more closely to understand it. Your claim
> that "the chip parameter is of no use to us" is simply wrong, as should be obvious
> when you read this submission. Actually, if you would convert the other
> drm drivers to use it, it would reduce the size of the hwmon specific code
> in those drivers, typically by 20-40%. Given that, I must admit that I am quite
> baffled by your claim. Maybe you could explain that in more detail.
> 
> Of course, I care about use of the hardware monitoring subsystem
> outside drivers/hwmon. Unlike other maintainers, I let people register drivers
> from outside that directory, but that doesn't mean that I don't care.

Hi Guenter,

First of all sorry for jumping late here. I'm totally with you here and we should
definitely only use the new API. For standard entries that will definitely
reduce the code size.

So, since we are talking about reducing code here, and looking to other DRM
drivers, and thinking about the needs on this new Xe driver, I'm wondering
if you would consider accepting 'frequency' as a standard hwmon attribute.

We would need it to be RW so we could use to put freq requests as well,
and possibly different types/domains and even throttle reasons on top.

So we could then try to unify all the drm drivers in a common drm-hwmon
layer putting an end in all abuses and deprecated users.

But before moving fwd with any proposal I'd like to hear your thoughts on
this 'frequency' block as standard attribute.

Thanks,
Rodrigo.

> 
> FWIW, you are close to convincing me to add a warning message to the kernel
> to tell users of deprecated hwmon APIs that the API is deprecated.
> Alternatively, I may stop permitting new hwmon drivers outside drivers/hwmon.
> 
> Guenter
> 
> > Thanks.
> > --
> > Ashutosh
>
Guenter Roeck July 14, 2023, 10:26 p.m. UTC | #11
On 7/14/23 13:21, Rodrigo Vivi wrote:
[ ... ]

> Hi Guenter,
> 
> First of all sorry for jumping late here. I'm totally with you here and we should
> definitely only use the new API. For standard entries that will definitely
> reduce the code size.
> 
> So, since we are talking about reducing code here, and looking to other DRM
> drivers, and thinking about the needs on this new Xe driver, I'm wondering
> if you would consider accepting 'frequency' as a standard hwmon attribute.
> 
> We would need it to be RW so we could use to put freq requests as well,
> and possibly different types/domains and even throttle reasons on top.
> 
> So we could then try to unify all the drm drivers in a common drm-hwmon
> layer putting an end in all abuses and deprecated users.
> 
> But before moving fwd with any proposal I'd like to hear your thoughts on
> this 'frequency' block as standard attribute.
> 

I really don't see how this would fit under "hardware monitoring".
Making it writable would be even worse - this is most definitely not a limit but
an actual value. The notion of limit actually shows that it is not a good fit as
a monitoring attribute: I can not conceive the notion of a "maximum" or "minimum"
frequency limit, or an "under" or "over" frequency.

If this is about thermal control/management, you might want to consider registering
with devfreq and the thermal subsystem (see devfreq_cooling_register() and
friends for reference).

Thanks,
Guenter
Rodrigo Vivi July 19, 2023, 5:01 p.m. UTC | #12
On Fri, Jul 14, 2023 at 03:26:49PM -0700, Guenter Roeck wrote:
> On 7/14/23 13:21, Rodrigo Vivi wrote:
> [ ... ]
> 
> > Hi Guenter,
> > 
> > First of all sorry for jumping late here. I'm totally with you here and we should
> > definitely only use the new API. For standard entries that will definitely
> > reduce the code size.
> > 
> > So, since we are talking about reducing code here, and looking to other DRM
> > drivers, and thinking about the needs on this new Xe driver, I'm wondering
> > if you would consider accepting 'frequency' as a standard hwmon attribute.
> > 
> > We would need it to be RW so we could use to put freq requests as well,
> > and possibly different types/domains and even throttle reasons on top.
> > 
> > So we could then try to unify all the drm drivers in a common drm-hwmon
> > layer putting an end in all abuses and deprecated users.
> > 
> > But before moving fwd with any proposal I'd like to hear your thoughts on
> > this 'frequency' block as standard attribute.
> > 
> 
> I really don't see how this would fit under "hardware monitoring".
> Making it writable would be even worse - this is most definitely not a limit but
> an actual value. The notion of limit actually shows that it is not a good fit as
> a monitoring attribute: I can not conceive the notion of a "maximum" or "minimum"
> frequency limit, or an "under" or "over" frequency.

how's that different from the voltage/pwm/current/etc min, max, critical RW limits
already existent?

> 
> If this is about thermal control/management, you might want to consider registering
> with devfreq and the thermal subsystem (see devfreq_cooling_register() and
> friends for reference).

yeap, it looks like devfreq is a good candidate for the unification. It is just
sad that it is not as robust and flexible as hwmon infrastructure.

> 
> Thanks,
> Guenter
>