Message ID | 20230113022752.3151066-1-vinay.belgaumkar@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/mtl: Connect root sysfs entries to GT0 | expand |
On Thu, 12 Jan 2023 18:27:52 -0800, Vinay Belgaumkar wrote: > > Reading current root sysfs entries gives a min/max of all > GTs. Updating this so we return default (GT0) values when root > level sysfs entries are accessed, instead of min/max for the card. > Tests that are not multi GT capable will read incorrect sysfs > values without this change on multi-GT platforms like MTL. > > Fixes: a8a4f0467d70 ("drm/i915: Fix CFI violations in gt_sysfs") We seem to be proposing to change the previous sysfs ABI with this patch? But even then it doesn't seem correct to use gt0 values for device level sysfs. Actually I received the following comment about using max freq across gt's for device level freq's (gt_act_freq_mhz etc.) from one of our users: ----- On Sun, 06 Nov 2022 08:54:04 -0800, Lawson, Lowren H wrote: Why show maximum? Wouldn’t average be more accurate to the user experience? As a user, I expect the ‘card’ frequency to be relatively accurate to the entire card. If I see 1.6GHz, but the card is behaving as if it’s running a 1.0 & 1.6GHz on the different compute tiles, I’m going to see a massive decrease in compute workload performance while at ‘maximum’ frequency. ----- So I am not sure why max/min were previously chosen. Why not the average? Thanks. -- Ashutosh
On 1/12/2023 7:15 PM, Dixit, Ashutosh wrote: > On Thu, 12 Jan 2023 18:27:52 -0800, Vinay Belgaumkar wrote: >> Reading current root sysfs entries gives a min/max of all >> GTs. Updating this so we return default (GT0) values when root >> level sysfs entries are accessed, instead of min/max for the card. >> Tests that are not multi GT capable will read incorrect sysfs >> values without this change on multi-GT platforms like MTL. >> >> Fixes: a8a4f0467d70 ("drm/i915: Fix CFI violations in gt_sysfs") > We seem to be proposing to change the previous sysfs ABI with this patch? > But even then it doesn't seem correct to use gt0 values for device level > sysfs. Actually I received the following comment about using max freq > across gt's for device level freq's (gt_act_freq_mhz etc.) from one of our > users: I think the ABI was changed by the patch mentioned in the commit (a8a4f0467d70). If I am not mistaken, original behavior was to return the GT0 values (I will double check this). IMO, if that patch changed the behavior, it should have been accompanied with patches that update all the tests to use the proper per GT sysfs as well. Thanks, Vinay. > > ----- > On Sun, 06 Nov 2022 08:54:04 -0800, Lawson, Lowren H wrote: > > Why show maximum? Wouldn’t average be more accurate to the user experience? > > As a user, I expect the ‘card’ frequency to be relatively accurate to the > entire card. If I see 1.6GHz, but the card is behaving as if it’s running a > 1.0 & 1.6GHz on the different compute tiles, I’m going to see a massive > decrease in compute workload performance while at ‘maximum’ frequency. > ----- > > So I am not sure why max/min were previously chosen. Why not the average? > > Thanks. > -- > Ashutosh
On Thu, 12 Jan 2023 20:26:34 -0800, Belgaumkar, Vinay wrote: > > I think the ABI was changed by the patch mentioned in the commit > (a8a4f0467d70). The ABI was originally changed in 80cf8af17af04 and 56a709cf77468.
On 1/12/2023 8:37 PM, Dixit, Ashutosh wrote: > On Thu, 12 Jan 2023 20:26:34 -0800, Belgaumkar, Vinay wrote: >> I think the ABI was changed by the patch mentioned in the commit >> (a8a4f0467d70). > The ABI was originally changed in 80cf8af17af04 and 56a709cf77468. Yes, you are right. @Andi, did we have a plan to update the IGT tests that use these interfaces to properly refer to the per GT entries as well? They now receive average values instead of absolute, hence will fail on a multi-GT device. Thanks, Vinay.
On 13/01/2023 03:15, Dixit, Ashutosh wrote: > On Thu, 12 Jan 2023 18:27:52 -0800, Vinay Belgaumkar wrote: >> >> Reading current root sysfs entries gives a min/max of all >> GTs. Updating this so we return default (GT0) values when root >> level sysfs entries are accessed, instead of min/max for the card. >> Tests that are not multi GT capable will read incorrect sysfs >> values without this change on multi-GT platforms like MTL. >> >> Fixes: a8a4f0467d70 ("drm/i915: Fix CFI violations in gt_sysfs") > > We seem to be proposing to change the previous sysfs ABI with this patch? > But even then it doesn't seem correct to use gt0 values for device level > sysfs. Actually I received the following comment about using max freq > across gt's for device level freq's (gt_act_freq_mhz etc.) from one of our > users: > > ----- > On Sun, 06 Nov 2022 08:54:04 -0800, Lawson, Lowren H wrote: > > Why show maximum? Wouldn’t average be more accurate to the user experience? > > As a user, I expect the ‘card’ frequency to be relatively accurate to the > entire card. If I see 1.6GHz, but the card is behaving as if it’s running a > 1.0 & 1.6GHz on the different compute tiles, I’m going to see a massive > decrease in compute workload performance while at ‘maximum’ frequency. > ----- > > So I am not sure why max/min were previously chosen. Why not the average? I think we still have time to just either stop exposing the global files on multi-tile platforms (all are under force probe), or return some error from them. IMO it's not kernel's job to provide any representation here - be in min, max, sum or average (different "blending" methods were discussed for different files) - all of them have some potential to be misleading from different angles. Regards, Tvrtko
Hi, On Thu, Jan 12, 2023 at 08:48:11PM -0800, Belgaumkar, Vinay wrote: > > On 1/12/2023 8:37 PM, Dixit, Ashutosh wrote: > > On Thu, 12 Jan 2023 20:26:34 -0800, Belgaumkar, Vinay wrote: > > > I think the ABI was changed by the patch mentioned in the commit > > > (a8a4f0467d70). > > The ABI was originally changed in 80cf8af17af04 and 56a709cf77468. In theory the ABI has never changed, we just needed to agree once and for all what to do when reading the upper level interface. There has never been a previous multitile specification before this change. There have been long and exhaustive discussions on what to do and the decision is that in some cases we show the average, in others the maximum. Never the GT0, though. > Yes, you are right. @Andi, did we have a plan to update the IGT tests that > use these interfaces to properly refer to the per GT entries as well? They > now receive average values instead of absolute, hence will fail on a > multi-GT device. I don't know what's the plan for igt's. Which tests are failing? I think we shouldn't be using the upper level interfaces at all in IGT's. Previously there has been an error printed on dmesg when this was happening. The error has been removed in order to set the ABI as agreed above. Andi
On 1/16/2023 10:58 AM, Andi Shyti wrote: > Hi, > > On Thu, Jan 12, 2023 at 08:48:11PM -0800, Belgaumkar, Vinay wrote: >> On 1/12/2023 8:37 PM, Dixit, Ashutosh wrote: >>> On Thu, 12 Jan 2023 20:26:34 -0800, Belgaumkar, Vinay wrote: >>>> I think the ABI was changed by the patch mentioned in the commit >>>> (a8a4f0467d70). >>> The ABI was originally changed in 80cf8af17af04 and 56a709cf77468. > In theory the ABI has never changed, we just needed to agree once > and for all what to do when reading the upper level interface. > There has never been a previous multitile specification before > this change. > > There have been long and exhaustive discussions on what to do and > the decision is that in some cases we show the average, in others > the maximum. Never the GT0, though. > >> Yes, you are right. @Andi, did we have a plan to update the IGT tests that >> use these interfaces to properly refer to the per GT entries as well? They >> now receive average values instead of absolute, hence will fail on a >> multi-GT device. > I don't know what's the plan for igt's. > > Which tests are failing? I think we shouldn't be using the upper > level interfaces at all in IGT's. Previously there has been an > error printed on dmesg when this was happening. The error has > been removed in order to set the ABI as agreed above. Tests like perf_mu and gem_ctx_freq will fail as they read upper level sysfs entries and expect them to change as per the test. I think this includes all of the tests that read RC6 or Trubo related sysfs entries for that matter. Thanks, Vinay. > > Andi
Hi Vinay, On Mon, Jan 16, 2023 at 11:35:41AM -0800, Belgaumkar, Vinay wrote: > > On 1/16/2023 10:58 AM, Andi Shyti wrote: > > Hi, > > > > On Thu, Jan 12, 2023 at 08:48:11PM -0800, Belgaumkar, Vinay wrote: > > > On 1/12/2023 8:37 PM, Dixit, Ashutosh wrote: > > > > On Thu, 12 Jan 2023 20:26:34 -0800, Belgaumkar, Vinay wrote: > > > > > I think the ABI was changed by the patch mentioned in the commit > > > > > (a8a4f0467d70). > > > > The ABI was originally changed in 80cf8af17af04 and 56a709cf77468. > > In theory the ABI has never changed, we just needed to agree once > > and for all what to do when reading the upper level interface. > > There has never been a previous multitile specification before > > this change. > > > > There have been long and exhaustive discussions on what to do and > > the decision is that in some cases we show the average, in others > > the maximum. Never the GT0, though. > > > > > Yes, you are right. @Andi, did we have a plan to update the IGT tests that > > > use these interfaces to properly refer to the per GT entries as well? They > > > now receive average values instead of absolute, hence will fail on a > > > multi-GT device. > > I don't know what's the plan for igt's. > > > > Which tests are failing? I think we shouldn't be using the upper > > level interfaces at all in IGT's. Previously there has been an > > error printed on dmesg when this was happening. The error has > > been removed in order to set the ABI as agreed above. > > Tests like perf_mu and gem_ctx_freq will fail as they read upper level sysfs > entries and expect them to change as per the test. I think this includes all > of the tests that read RC6 or Trubo related sysfs entries for that matter. First of all I hope we are talking about multitile machines. In a single tile we shouldn't see any difference. If there is, then there is a bug. I think the tests need to be updated. In a multitile card we have decided that reading from the upper interfaces would provide a somehow meaningful value for all the GTs. It's like asking "what is the RC6 for all the GTs in one number?" or "what is the frequency for all the GTs in one single frequency?". Providing the value of GT0 is misleading and with Joonas we agreed that RC6 would provide the average, while the frequencies would provide the maximum. This when reading. When writing, instead, the command sent to the upper layers applies to all the GTs. The original approach was to mark the upper interfaces as deprecated, for this reason I added initially an error message in order to avoid confusion and force IGTs to update. But the error message was removed and meanwhile we agreed to give some meaning to the upper interfaces. Do you need help with updating the IGTs? Andi > Thanks, > > Vinay. > > > > > Andi
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.c index cf71305ad586..395ae47483a7 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.c @@ -28,77 +28,42 @@ sysfs_gt_attribute_w_func(struct kobject *kobj, struct attribute *attr, int (func)(struct intel_gt *gt, u32 val), u32 val) { struct intel_gt *gt; - int ret; if (!is_object_gt(kobj)) { - int i; struct device *dev = kobj_to_dev(kobj); struct drm_i915_private *i915 = kdev_minor_to_i915(dev); - for_each_gt(gt, i915, i) { - ret = func(gt, val); - if (ret) - break; - } + gt = to_gt(i915); } else { gt = intel_gt_sysfs_get_drvdata(kobj, attr->name); - ret = func(gt, val); } - return ret; + return func(gt, val); } static u32 sysfs_gt_attribute_r_func(struct kobject *kobj, struct attribute *attr, - u32 (func)(struct intel_gt *gt), - enum intel_gt_sysfs_op op) + u32 (func)(struct intel_gt *gt)) { struct intel_gt *gt; - u32 ret; - - ret = (op == INTEL_GT_SYSFS_MAX) ? 0 : (u32) -1; if (!is_object_gt(kobj)) { - int i; struct device *dev = kobj_to_dev(kobj); struct drm_i915_private *i915 = kdev_minor_to_i915(dev); - for_each_gt(gt, i915, i) { - u32 val = func(gt); - - switch (op) { - case INTEL_GT_SYSFS_MIN: - if (val < ret) - ret = val; - break; - - case INTEL_GT_SYSFS_MAX: - if (val > ret) - ret = val; - break; - } - } + gt = to_gt(i915); } else { gt = intel_gt_sysfs_get_drvdata(kobj, attr->name); - ret = func(gt); } - return ret; + return func(gt); } -/* RC6 interfaces will show the minimum RC6 residency value */ -#define sysfs_gt_attribute_r_min_func(d, a, f) \ - sysfs_gt_attribute_r_func(d, a, f, INTEL_GT_SYSFS_MIN) - -/* Frequency interfaces will show the maximum frequency value */ -#define sysfs_gt_attribute_r_max_func(d, a, f) \ - sysfs_gt_attribute_r_func(d, a, f, INTEL_GT_SYSFS_MAX) - #define INTEL_GT_SYSFS_SHOW(_name, _attr_type) \ static ssize_t _name##_show_common(struct kobject *kobj, \ struct attribute *attr, char *buff) \ { \ - u32 val = sysfs_gt_attribute_r_##_attr_type##_func(kobj, attr, \ + u32 val = sysfs_gt_attribute_r_func(kobj, attr, \ __##_name##_show); \ \ return sysfs_emit(buff, "%u\n", val); \
Reading current root sysfs entries gives a min/max of all GTs. Updating this so we return default (GT0) values when root level sysfs entries are accessed, instead of min/max for the card. Tests that are not multi GT capable will read incorrect sysfs values without this change on multi-GT platforms like MTL. Fixes: a8a4f0467d70 ("drm/i915: Fix CFI violations in gt_sysfs") Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> --- drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.c | 47 +++------------------ 1 file changed, 6 insertions(+), 41 deletions(-)