diff mbox series

[1/2] drm/i915/gt: Add GT oriented dmesg output

Message ID 20221104172525.569913-2-John.C.Harrison@Intel.com (mailing list archive)
State New, archived
Headers show
Series Add GT oriented dmesg output | expand

Commit Message

John Harrison Nov. 4, 2022, 5:25 p.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

When trying to analyse bug reports from CI, customers, etc. it can be
difficult to work out exactly what is happening on which GT in a
multi-GT system. So add GT oriented debug/error message wrappers. If
used instead of the drm_ equivalents, you get the same output but with
a GT# prefix on it.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Daniele Ceraolo Spurio Nov. 5, 2022, 1:03 a.m. UTC | #1
On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> When trying to analyse bug reports from CI, customers, etc. it can be
> difficult to work out exactly what is happening on which GT in a
> multi-GT system. So add GT oriented debug/error message wrappers. If
> used instead of the drm_ equivalents, you get the same output but with
> a GT# prefix on it.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

The only downside to this is that we'll print "GT0: " even on single-GT 
devices. We could introduce a gt->info.name and print that, so we could 
have it different per-platform, but IMO it's not worth the effort.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

I think it might be worth getting an ack from one of the maintainers to 
make sure we're all aligned on transitioning to these new logging macro 
for gt code.

Daniele

> ---
>   drivers/gpu/drm/i915/gt/intel_gt.h | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index e0365d5562484..1e016fb0117a4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -13,6 +13,21 @@
>   struct drm_i915_private;
>   struct drm_printer;
>   
> +#define GT_ERR(_gt, _fmt, ...) \
> +	drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
> +
> +#define GT_WARN(_gt, _fmt, ...) \
> +	drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
> +
> +#define GT_NOTICE(_gt, _fmt, ...) \
> +	drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
> +
> +#define GT_INFO(_gt, _fmt, ...) \
> +	drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
> +
> +#define GT_DBG(_gt, _fmt, ...) \
> +	drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
> +
>   #define GT_TRACE(gt, fmt, ...) do {					\
>   	const struct intel_gt *gt__ __maybe_unused = (gt);		\
>   	GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev),		\
Tvrtko Ursulin Nov. 7, 2022, 9:33 a.m. UTC | #2
On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
> 
> 
> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> When trying to analyse bug reports from CI, customers, etc. it can be
>> difficult to work out exactly what is happening on which GT in a
>> multi-GT system. So add GT oriented debug/error message wrappers. If
>> used instead of the drm_ equivalents, you get the same output but with
>> a GT# prefix on it.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> 
> The only downside to this is that we'll print "GT0: " even on single-GT 
> devices. We could introduce a gt->info.name and print that, so we could 
> have it different per-platform, but IMO it's not worth the effort.
> 
> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> I think it might be worth getting an ack from one of the maintainers to 
> make sure we're all aligned on transitioning to these new logging macro 
> for gt code.

Idea is I think a very good one. First I would suggest standardising to lowercase GT in logs because:

$ grep "GT%" i915/ -r
$ grep "gt%" i915/ -r
i915/gt/intel_gt_sysfs.c:                                gt->i915->sysfs_gt, "gt%d", gt->info.id))
i915/gt/intel_gt_sysfs.c:                "failed to initialize gt%d sysfs root\n", gt->info.id);
i915/gt/intel_gt_sysfs_pm.c:                     "failed to create gt%u RC6 sysfs files (%pe)\n",
i915/gt/intel_gt_sysfs_pm.c:                             "failed to create gt%u RC6p sysfs files (%pe)\n",
i915/gt/intel_gt_sysfs_pm.c:                     "failed to create gt%u RPS sysfs files (%pe)",
i915/gt/intel_gt_sysfs_pm.c:                     "failed to create gt%u punit_req_freq_mhz sysfs (%pe)",
i915/gt/intel_gt_sysfs_pm.c:                             "failed to create gt%u throttle sysfs files (%pe)",
i915/gt/intel_gt_sysfs_pm.c:                             "failed to create gt%u media_perf_power_attrs sysfs (%pe)\n",
i915/gt/intel_gt_sysfs_pm.c:                     "failed to add gt%u rps defaults (%pe)\n",
i915/i915_driver.c:                     drm_err(&gt->i915->drm, "gt%d: intel_pcode_init failed %d\n", id, ret);
i915/i915_hwmon.c:              snprintf(ddat_gt->name, sizeof(ddat_gt->name), "i915_gt%u", i);

Then there is a question of naming. Are we okay with GT_XXX or, do we want intel_gt_, or something completely different. I don't have a strong opinion at the moment so I'll add some more folks to Cc.

What I'd would like to see tried is to converting all of i915/gt within one kernel release so we don't have a mish-mash of log formats.

Regards,

Tvrtko
  
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gt.h | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
>> b/drivers/gpu/drm/i915/gt/intel_gt.h
>> index e0365d5562484..1e016fb0117a4 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>> @@ -13,6 +13,21 @@
>>   struct drm_i915_private;
>>   struct drm_printer;
>> +#define GT_ERR(_gt, _fmt, ...) \
>> +    drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>> ##__VA_ARGS__)
>> +
>> +#define GT_WARN(_gt, _fmt, ...) \
>> +    drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>> ##__VA_ARGS__)
>> +
>> +#define GT_NOTICE(_gt, _fmt, ...) \
>> +    drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>> ##__VA_ARGS__)
>> +
>> +#define GT_INFO(_gt, _fmt, ...) \
>> +    drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>> ##__VA_ARGS__)
>> +
>> +#define GT_DBG(_gt, _fmt, ...) \
>> +    drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>> ##__VA_ARGS__)
>> +
>>   #define GT_TRACE(gt, fmt, ...) do {                    \
>>       const struct intel_gt *gt__ __maybe_unused = (gt);        \
>>       GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev),        \
>
Tvrtko Ursulin Nov. 7, 2022, 4:17 p.m. UTC | #3
On 07/11/2022 09:33, Tvrtko Ursulin wrote:
> 
> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>
>>
>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> When trying to analyse bug reports from CI, customers, etc. it can be
>>> difficult to work out exactly what is happening on which GT in a
>>> multi-GT system. So add GT oriented debug/error message wrappers. If
>>> used instead of the drm_ equivalents, you get the same output but with
>>> a GT# prefix on it.
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>
>> The only downside to this is that we'll print "GT0: " even on 
>> single-GT devices. We could introduce a gt->info.name and print that, 
>> so we could have it different per-platform, but IMO it's not worth the 
>> effort.
>>
>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>
>> I think it might be worth getting an ack from one of the maintainers 
>> to make sure we're all aligned on transitioning to these new logging 
>> macro for gt code.
> 
> Idea is I think a very good one. First I would suggest standardising to 
> lowercase GT in logs because:
> 
> $ grep "GT%" i915/ -r
> $ grep "gt%" i915/ -r
> i915/gt/intel_gt_sysfs.c:                                
> gt->i915->sysfs_gt, "gt%d", gt->info.id))
> i915/gt/intel_gt_sysfs.c:                "failed to initialize gt%d 
> sysfs root\n", gt->info.id);
> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create gt%u 
> RC6 sysfs files (%pe)\n",
> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
> create gt%u RC6p sysfs files (%pe)\n",
> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create gt%u 
> RPS sysfs files (%pe)",
> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create gt%u 
> punit_req_freq_mhz sysfs (%pe)",
> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
> create gt%u throttle sysfs files (%pe)",
> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
> create gt%u media_perf_power_attrs sysfs (%pe)\n",
> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add gt%u rps 
> defaults (%pe)\n",
> i915/i915_driver.c:                     drm_err(&gt->i915->drm, "gt%d: 
> intel_pcode_init failed %d\n", id, ret);
> i915/i915_hwmon.c:              snprintf(ddat_gt->name, 
> sizeof(ddat_gt->name), "i915_gt%u", i);
> 
> Then there is a question of naming. Are we okay with GT_XXX or, do we 
> want intel_gt_, or something completely different. I don't have a strong 
> opinion at the moment so I'll add some more folks to Cc.

There was a maintainer level mini-discussion on this topic which I will 
try to summarise.

Main contention point was the maintenance cost and generally an 
undesirable pattern of needing to add many subsystem/component/directory 
specific macros. Which then typically need extra flavours and so on. But 
over verbosity of the code is obviously also bad, so one compromise idea 
was to add a macro which builds the GT string and use drm logging 
helpers directly. This would be something like:

  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
  drm_info(GT_LOG(...same...));

Whether or not to put the gt as parameter to the helper macro or outside 
wasn't really decided upon. Anyway the macro would be adding the magic 
"gt%u: " prefix, drm device and all.

Also the name GT_LOG (or case) is just for illustration, that part 
wasn't really discussed.

If agreeable this pattern could then be used to consolidate some other 
macros that we have. Although apart from CT_DEBUG/ERROR I don't know if 
we have any others.

I hope I have transferred the idea correctly. Please shout if I have not.

Regards,

Tvrtko


> What I'd would like to see tried is to converting all of i915/gt within 
> one kernel release so we don't have a mish-mash of log formats.
> 
> Regards,
> 
> Tvrtko
> 
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_gt.h | 15 +++++++++++++++
>>>   1 file changed, 15 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
>>> b/drivers/gpu/drm/i915/gt/intel_gt.h
>>> index e0365d5562484..1e016fb0117a4 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>>> @@ -13,6 +13,21 @@
>>>   struct drm_i915_private;
>>>   struct drm_printer;
>>> +#define GT_ERR(_gt, _fmt, ...) \
>>> +    drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>> ##__VA_ARGS__)
>>> +
>>> +#define GT_WARN(_gt, _fmt, ...) \
>>> +    drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>> ##__VA_ARGS__)
>>> +
>>> +#define GT_NOTICE(_gt, _fmt, ...) \
>>> +    drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>> ##__VA_ARGS__)
>>> +
>>> +#define GT_INFO(_gt, _fmt, ...) \
>>> +    drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>> ##__VA_ARGS__)
>>> +
>>> +#define GT_DBG(_gt, _fmt, ...) \
>>> +    drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>> ##__VA_ARGS__)
>>> +
>>>   #define GT_TRACE(gt, fmt, ...) do {                    \
>>>       const struct intel_gt *gt__ __maybe_unused = (gt);        \
>>>       GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev),        \
>>
John Harrison Nov. 7, 2022, 7:14 p.m. UTC | #4
On 11/7/2022 08:17, Tvrtko Ursulin wrote:
> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> When trying to analyse bug reports from CI, customers, etc. it can be
>>>> difficult to work out exactly what is happening on which GT in a
>>>> multi-GT system. So add GT oriented debug/error message wrappers. If
>>>> used instead of the drm_ equivalents, you get the same output but with
>>>> a GT# prefix on it.
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The only downside to this is that we'll print "GT0: " even on 
>>> single-GT devices. We could introduce a gt->info.name and print 
>>> that, so we could have it different per-platform, but IMO it's not 
>>> worth the effort.
>>>
>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>
>>> I think it might be worth getting an ack from one of the maintainers 
>>> to make sure we're all aligned on transitioning to these new logging 
>>> macro for gt code.
>>
>> Idea is I think a very good one. First I would suggest standardising 
>> to lowercase GT in logs because:
>>
>> $ grep "GT%" i915/ -r
>> $ grep "gt%" i915/ -r
>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>> i915/gt/intel_gt_sysfs.c:                "failed to initialize gt%d 
>> sysfs root\n", gt->info.id);
>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>> gt%u RC6 sysfs files (%pe)\n",
>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>> create gt%u RC6p sysfs files (%pe)\n",
>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>> gt%u RPS sysfs files (%pe)",
>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>> gt%u punit_req_freq_mhz sysfs (%pe)",
>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>> create gt%u throttle sysfs files (%pe)",
>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>> create gt%u media_perf_power_attrs sysfs (%pe)\n",
>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add gt%u 
>> rps defaults (%pe)\n",
>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: intel_pcode_init 
>> failed %d\n", id, ret);
>> i915/i915_hwmon.c:              snprintf(ddat_gt->name, 
>> sizeof(ddat_gt->name), "i915_gt%u", i);
>>

Just because there are 11 existing instances of one form doesn't mean 
that the 275 instances that are waiting to be converted should be done 
incorrectly. GT is an acronym and should be capitalised.

Besides:
grep -r "GT " i915 | grep '"'
i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout disabling 
GT waking\n");
i915/vlv_suspend.c:                     "timeout waiting for GT wells to 
go %s\n",
i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access while GT 
waking disabled\n");
i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
str_yes_no(gt->awake));
i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on %s)", 
engine->name);
i915/intel_uncore.c:                  "GT thread status wait timed out\n");
i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT failed 
to idle: %d\n", ret);
i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to idle: 
%d\n", ret);
i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to idle: 
%d\n", ret);
i915/gt/intel_gt_mcr.c: * Some GT registers are designed as "multicast" 
or "replicated" registers:
i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d C0 
cycles [%lldns] in %lldns [%d cycles], using GT clock frequency of %uKHz\n",
i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
wedged!\n", engine->name);
i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
changed, was %uHz, now %uHz!\n",
i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT pm 
before test\n");
i915/gt/selftest_engine_pm.c:                           pr_err("GT 
failed to idle\n");
i915/i915_sysfs.c:                       "failed to register GT sysfs 
directory\n");
i915/intel_uncore.h:     * of the basic non-engine GT registers 
(referred to as "GSI" on
i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
platforms)?  If so, we'll



>> Then there is a question of naming. Are we okay with GT_XXX or, do we 
>> want intel_gt_, or something completely different. I don't have a 
>> strong opinion at the moment so I'll add some more folks to Cc.
>
You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
prefer just gt_err("msg") to keep it as close to the official drm_* 
versions as possible. Print lines tend to be excessively long already. 
Taking a 'gt' parameter instead of a '&gt->i915->drm' parameter does 
help with that but it seems like calling the wrapper intel_gt_* is 
shooting ourselves in the foot on that one. And GT_ERR vs gt_err just 
comes down to the fact that it is a macro wrapper and therefore is 
required to be in upper case.

> There was a maintainer level mini-discussion on this topic which I 
> will try to summarise.
>
> Main contention point was the maintenance cost and generally an 
> undesirable pattern of needing to add many 
> subsystem/component/directory specific macros. Which then typically 
> need extra flavours and so on. But over verbosity of the 
How many versions are you expecting to add? Beyond the tile instance, 
what further addressing requirements are there? The card instance is 
already printed as part of the PCI address. The only other reason to add 
per component wrappers would be to wrap the mechanism for getting from 
some random per component object back to the intel_gt structure. But 
that is hardware a new issue being added by this wrapper. It is also not 
a requirement. Much of the code has a gt pointer already. For the parts 
that don't, some of it would be a trivial engine->gt type dereference, 
some of it is a more complex container_of type construction. But for 
those, the given file will already have multiple instances of that 
already (usually as the first or second line of the function - 'intel_gt 
*gt = fancy_access_method(my_obj)' so adding one or two more of those as 
necessary is not making the code harder to read.

> code is obviously also bad, so one compromise idea was to add a macro 
> which builds the GT string and use drm logging helpers directly. This 
> would be something like:
>
>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>  drm_info(GT_LOG(...same...));
Seriously? As above, some of these lines are already way too long, this 
version makes them even longer with no obvious benefit. Worse, it makes 
it harder to read what is going on. It is much less intuitive to read 
than just replacing the drm_err itself. And having two sets of 
parenthesis with some parameters inside the first and some only inside 
the second is really horrid! Also, putting the 'gt' parameter in the 
middle just confuses it with the rest of the printf arguments even 
though there is no %d in the string for it. So now a quick glances tells 
you that your code is wrong because you have three format specifiers but 
four parameters.

Whereas, just replacing drm_err with gt_err (or GT_ERR or intel_gt_err) 
keeps everything else consistent. The first parameter changes from 'drm' 
to 'gt' but is still the master object parameter and it matches the 
function/macro prefix so inherently looks correct. Then you have your 
message plus parameters. No confusing orders, no confusing parenthesis, 
no excessive macro levels, no confusion at all. Just nice simple, easy 
to read, easy to maintain code.

>
> Whether or not to put the gt as parameter to the helper macro or 
> outside wasn't really decided upon. Anyway the macro would be adding 
> the magic "gt%u: " prefix, drm device and all.
>
> Also the name GT_LOG (or case) is just for illustration, that part 
> wasn't really discussed.
>
> If agreeable this pattern could then be used to consolidate some other 
> macros that we have. Although apart from CT_DEBUG/ERROR I don't know 
> if we have any others.
By consolidating as what? I hope you aren't meaning something like 
'drm_err(CT_DEBUG(GT_LOG("msg")))?'

John.

>
> I hope I have transferred the idea correctly. Please shout if I have not.
>
> Regards,
>
> Tvrtko
>
>
>> What I'd would like to see tried is to converting all of i915/gt 
>> within one kernel release so we don't have a mish-mash of log formats.
If each subsystem's team would like to convert their own section...

John.

>>
>> Regards,
>>
>> Tvrtko
>>
>>>> ---
>>>>   drivers/gpu/drm/i915/gt/intel_gt.h | 15 +++++++++++++++
>>>>   1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
>>>> b/drivers/gpu/drm/i915/gt/intel_gt.h
>>>> index e0365d5562484..1e016fb0117a4 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>>>> @@ -13,6 +13,21 @@
>>>>   struct drm_i915_private;
>>>>   struct drm_printer;
>>>> +#define GT_ERR(_gt, _fmt, ...) \
>>>> +    drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>> +#define GT_WARN(_gt, _fmt, ...) \
>>>> +    drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>> +#define GT_NOTICE(_gt, _fmt, ...) \
>>>> +    drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>> +#define GT_INFO(_gt, _fmt, ...) \
>>>> +    drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>> +#define GT_DBG(_gt, _fmt, ...) \
>>>> +    drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
>>>> ##__VA_ARGS__)
>>>> +
>>>>   #define GT_TRACE(gt, fmt, ...) do {                    \
>>>>       const struct intel_gt *gt__ __maybe_unused = (gt);        \
>>>>       GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev),        \
>>>
Tvrtko Ursulin Nov. 8, 2022, 9:01 a.m. UTC | #5
On 07/11/2022 19:14, John Harrison wrote:
> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> When trying to analyse bug reports from CI, customers, etc. it can be
>>>>> difficult to work out exactly what is happening on which GT in a
>>>>> multi-GT system. So add GT oriented debug/error message wrappers. If
>>>>> used instead of the drm_ equivalents, you get the same output but with
>>>>> a GT# prefix on it.
>>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> The only downside to this is that we'll print "GT0: " even on 
>>>> single-GT devices. We could introduce a gt->info.name and print 
>>>> that, so we could have it different per-platform, but IMO it's not 
>>>> worth the effort.
>>>>
>>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>
>>>> I think it might be worth getting an ack from one of the maintainers 
>>>> to make sure we're all aligned on transitioning to these new logging 
>>>> macro for gt code.
>>>
>>> Idea is I think a very good one. First I would suggest standardising 
>>> to lowercase GT in logs because:
>>>
>>> $ grep "GT%" i915/ -r
>>> $ grep "gt%" i915/ -r
>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize gt%d 
>>> sysfs root\n", gt->info.id);
>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>> gt%u RC6 sysfs files (%pe)\n",
>>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>>> create gt%u RC6p sysfs files (%pe)\n",
>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>> gt%u RPS sysfs files (%pe)",
>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>> gt%u punit_req_freq_mhz sysfs (%pe)",
>>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>>> create gt%u throttle sysfs files (%pe)",
>>> i915/gt/intel_gt_sysfs_pm.c:                             "failed to 
>>> create gt%u media_perf_power_attrs sysfs (%pe)\n",
>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add gt%u 
>>> rps defaults (%pe)\n",
>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: intel_pcode_init 
>>> failed %d\n", id, ret);
>>> i915/i915_hwmon.c:              snprintf(ddat_gt->name, 
>>> sizeof(ddat_gt->name), "i915_gt%u", i);
>>>
> 
> Just because there are 11 existing instances of one form doesn't mean 
> that the 275 instances that are waiting to be converted should be done 
> incorrectly. GT is an acronym and should be capitalised.

Okay just make it consistent then.

> Besides:
> grep -r "GT " i915 | grep '"'
> i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout disabling 
> GT waking\n");
> i915/vlv_suspend.c:                     "timeout waiting for GT wells to 
> go %s\n",
> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access while GT 
> waking disabled\n");
> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
> str_yes_no(gt->awake));
> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on %s)", 
> engine->name);
> i915/intel_uncore.c:                  "GT thread status wait timed out\n");
> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT failed 
> to idle: %d\n", ret);
> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to idle: 
> %d\n", ret);
> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to idle: 
> %d\n", ret);
> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as "multicast" 
> or "replicated" registers:
> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d C0 
> cycles [%lldns] in %lldns [%d cycles], using GT clock frequency of 
> %uKHz\n",
> i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
> wedged!\n", engine->name);
> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
> i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
> changed, was %uHz, now %uHz!\n",
> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT pm 
> before test\n");
> i915/gt/selftest_engine_pm.c:                           pr_err("GT 
> failed to idle\n");
> i915/i915_sysfs.c:                       "failed to register GT sysfs 
> directory\n");
> i915/intel_uncore.h:     * of the basic non-engine GT registers 
> (referred to as "GSI" on
> i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
> platforms)?  If so, we'll
> 
> 
> 
>>> Then there is a question of naming. Are we okay with GT_XXX or, do we 
>>> want intel_gt_, or something completely different. I don't have a 
>>> strong opinion at the moment so I'll add some more folks to Cc.
>>
> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
> prefer just gt_err("msg") to keep it as close to the official drm_* 
> versions as possible. Print lines tend to be excessively long already. 
> Taking a 'gt' parameter instead of a '&gt->i915->drm' parameter does 
> help with that but it seems like calling the wrapper intel_gt_* is 
> shooting ourselves in the foot on that one. And GT_ERR vs gt_err just 
> comes down to the fact that it is a macro wrapper and therefore is 
> required to be in upper case.
> 
>> There was a maintainer level mini-discussion on this topic which I 
>> will try to summarise.
>>
>> Main contention point was the maintenance cost and generally an 
>> undesirable pattern of needing to add many 
>> subsystem/component/directory specific macros. Which then typically 
>> need extra flavours and so on. But over verbosity of the 
> How many versions are you expecting to add? Beyond the tile instance, 
> what further addressing requirements are there? The card instance is 
> already printed as part of the PCI address. The only other reason to add 
> per component wrappers would be to wrap the mechanism for getting from 
> some random per component object back to the intel_gt structure. But 
> that is hardware a new issue being added by this wrapper. It is also not 
> a requirement. Much of the code has a gt pointer already. For the parts 
> that don't, some of it would be a trivial engine->gt type dereference, 
> some of it is a more complex container_of type construction. But for 
> those, the given file will already have multiple instances of that 
> already (usually as the first or second line of the function - 'intel_gt 
> *gt = fancy_access_method(my_obj)' so adding one or two more of those as 
> necessary is not making the code harder to read.
> 
>> code is obviously also bad, so one compromise idea was to add a macro 
>> which builds the GT string and use drm logging helpers directly. This 
>> would be something like:
>>
>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>  drm_info(GT_LOG(...same...));
> Seriously? As above, some of these lines are already way too long, this 
> version makes them even longer with no obvious benefit. Worse, it makes 
> it harder to read what is going on. It is much less intuitive to read 
> than just replacing the drm_err itself. And having two sets of 
> parenthesis with some parameters inside the first and some only inside 
> the second is really horrid! Also, putting the 'gt' parameter in the 
> middle just confuses it with the rest of the printf arguments even 
> though there is no %d in the string for it. So now a quick glances tells 
> you that your code is wrong because you have three format specifiers but 
> four parameters.
> 
> Whereas, just replacing drm_err with gt_err (or GT_ERR or intel_gt_err) 
> keeps everything else consistent. The first parameter changes from 'drm' 
> to 'gt' but is still the master object parameter and it matches the 
> function/macro prefix so inherently looks correct. Then you have your 
> message plus parameters. No confusing orders, no confusing parenthesis, 
> no excessive macro levels, no confusion at all. Just nice simple, easy 
> to read, easy to maintain code.

I am personally okay with gt_err/GT_ERR some other folks might object 
though. And I can also understand the argument why it is better to not 
have to define gt_err, gt_warn, gt_info, gt_notice, gt_debug, 
gt_err_ratelimited, gt_warn_once.. and instead have only one macro.

Because of that I was passing on to you the compromise option.

It maybe still has net space savings since we wouldn't have to be 
repeating the gt->i915->drm whatever and gt->info.id on every line.

You are free to try the most compact one and see how hard those 
objections will be.

>> Whether or not to put the gt as parameter to the helper macro or 
>> outside wasn't really decided upon. Anyway the macro would be adding 
>> the magic "gt%u: " prefix, drm device and all.
>>
>> Also the name GT_LOG (or case) is just for illustration, that part 
>> wasn't really discussed.
>>
>> If agreeable this pattern could then be used to consolidate some other 
>> macros that we have. Although apart from CT_DEBUG/ERROR I don't know 
>> if we have any others.
> By consolidating as what? I hope you aren't meaning something like 
> 'drm_err(CT_DEBUG(GT_LOG("msg")))?'

No, consolidate as in use the same pattern of directly calling drm 
logging helpers with our own formatting macro which changes per 
subsystem/component/directory/whatever.

Regards,

Tvrtko
John Harrison Nov. 8, 2022, 8:15 p.m. UTC | #6
On 11/8/2022 01:01, Tvrtko Ursulin wrote:
> On 07/11/2022 19:14, John Harrison wrote:
>> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>
>>>>>> When trying to analyse bug reports from CI, customers, etc. it 
>>>>>> can be
>>>>>> difficult to work out exactly what is happening on which GT in a
>>>>>> multi-GT system. So add GT oriented debug/error message wrappers. If
>>>>>> used instead of the drm_ equivalents, you get the same output but 
>>>>>> with
>>>>>> a GT# prefix on it.
>>>>>>
>>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> The only downside to this is that we'll print "GT0: " even on 
>>>>> single-GT devices. We could introduce a gt->info.name and print 
>>>>> that, so we could have it different per-platform, but IMO it's not 
>>>>> worth the effort.
>>>>>
>>>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>>
>>>>> I think it might be worth getting an ack from one of the 
>>>>> maintainers to make sure we're all aligned on transitioning to 
>>>>> these new logging macro for gt code.
>>>>
>>>> Idea is I think a very good one. First I would suggest 
>>>> standardising to lowercase GT in logs because:
>>>>
>>>> $ grep "GT%" i915/ -r
>>>> $ grep "gt%" i915/ -r
>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize gt%d 
>>>> sysfs root\n", gt->info.id);
>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>> gt%u RC6 sysfs files (%pe)\n",
>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs 
>>>> files (%pe)\n",
>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>> gt%u RPS sysfs files (%pe)",
>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>> gt%u punit_req_freq_mhz sysfs (%pe)",
>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle sysfs 
>>>> files (%pe)",
>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u 
>>>> media_perf_power_attrs sysfs (%pe)\n",
>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add 
>>>> gt%u rps defaults (%pe)\n",
>>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: intel_pcode_init 
>>>> failed %d\n", id, ret);
>>>> i915/i915_hwmon.c:              snprintf(ddat_gt->name, 
>>>> sizeof(ddat_gt->name), "i915_gt%u", i);
>>>>
>>
>> Just because there are 11 existing instances of one form doesn't mean 
>> that the 275 instances that are waiting to be converted should be 
>> done incorrectly. GT is an acronym and should be capitalised.
>
> Okay just make it consistent then.
>
>> Besides:
>> grep -r "GT " i915 | grep '"'
>> i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout 
>> disabling GT waking\n");
>> i915/vlv_suspend.c:                     "timeout waiting for GT wells 
>> to go %s\n",
>> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access while 
>> GT waking disabled\n");
>> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
>> str_yes_no(gt->awake));
>> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on %s)", 
>> engine->name);
>> i915/intel_uncore.c:                  "GT thread status wait timed 
>> out\n");
>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT 
>> failed to idle: %d\n", ret);
>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>> idle: %d\n", ret);
>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>> idle: %d\n", ret);
>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as 
>> "multicast" or "replicated" registers:
>> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d 
>> C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency of 
>> %uKHz\n",
>> i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
>> wedged!\n", engine->name);
>> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
>> i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
>> changed, was %uHz, now %uHz!\n",
>> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT pm 
>> before test\n");
>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n");
>> i915/i915_sysfs.c:                       "failed to register GT sysfs 
>> directory\n");
>> i915/intel_uncore.h:     * of the basic non-engine GT registers 
>> (referred to as "GSI" on
>> i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
>> platforms)?  If so, we'll
>>
>>
>>
>>>> Then there is a question of naming. Are we okay with GT_XXX or, do 
>>>> we want intel_gt_, or something completely different. I don't have 
>>>> a strong opinion at the moment so I'll add some more folks to Cc.
>>>
>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
>> prefer just gt_err("msg") to keep it as close to the official drm_* 
>> versions as possible. Print lines tend to be excessively long 
>> already. Taking a 'gt' parameter instead of a '&gt->i915->drm' 
>> parameter does help with that but it seems like calling the wrapper 
>> intel_gt_* is shooting ourselves in the foot on that one. And GT_ERR 
>> vs gt_err just comes down to the fact that it is a macro wrapper and 
>> therefore is required to be in upper case.
>>
>>> There was a maintainer level mini-discussion on this topic which I 
>>> will try to summarise.
>>>
>>> Main contention point was the maintenance cost and generally an 
>>> undesirable pattern of needing to add many 
>>> subsystem/component/directory specific macros. Which then typically 
>>> need extra flavours and so on. But over verbosity of the 
>> How many versions are you expecting to add? Beyond the tile instance, 
>> what further addressing requirements are there? The card instance is 
>> already printed as part of the PCI address. The only other reason to 
>> add per component wrappers would be to wrap the mechanism for getting 
>> from some random per component object back to the intel_gt structure. 
>> But that is hardware a new issue being added by this wrapper. It is 
>> also not a requirement. Much of the code has a gt pointer already. 
>> For the parts that don't, some of it would be a trivial engine->gt 
>> type dereference, some of it is a more complex container_of type 
>> construction. But for those, the given file will already have 
>> multiple instances of that already (usually as the first or second 
>> line of the function - 'intel_gt *gt = fancy_access_method(my_obj)' 
>> so adding one or two more of those as necessary is not making the 
>> code harder to read.
>>
>>> code is obviously also bad, so one compromise idea was to add a 
>>> macro which builds the GT string and use drm logging helpers 
>>> directly. This would be something like:
>>>
>>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>>  drm_info(GT_LOG(...same...));
>> Seriously? As above, some of these lines are already way too long, 
>> this version makes them even longer with no obvious benefit. Worse, 
>> it makes it harder to read what is going on. It is much less 
>> intuitive to read than just replacing the drm_err itself. And having 
>> two sets of parenthesis with some parameters inside the first and 
>> some only inside the second is really horrid! Also, putting the 'gt' 
>> parameter in the middle just confuses it with the rest of the printf 
>> arguments even though there is no %d in the string for it. So now a 
>> quick glances tells you that your code is wrong because you have 
>> three format specifiers but four parameters.
>>
>> Whereas, just replacing drm_err with gt_err (or GT_ERR or 
>> intel_gt_err) keeps everything else consistent. The first parameter 
>> changes from 'drm' to 'gt' but is still the master object parameter 
>> and it matches the function/macro prefix so inherently looks correct. 
>> Then you have your message plus parameters. No confusing orders, no 
>> confusing parenthesis, no excessive macro levels, no confusion at 
>> all. Just nice simple, easy to read, easy to maintain code.
>
> I am personally okay with gt_err/GT_ERR some other folks might object 
> though. And I can also understand the argument why it is better to not 
> have to define gt_err, gt_warn, gt_info, gt_notice, gt_debug, 
> gt_err_ratelimited, gt_warn_once.. and instead have only one macro.
A small set of trivial macro definitions vs a complicated and unreadable 
construct on every single print? Erm, isn't that the very definition of 
abstracting to helpers as generally required by every code review ever?

And what 'other folks might object'? People already CC'd? People outside 
of i915?


>
> Because of that I was passing on to you the compromise option.
>
> It maybe still has net space savings since we wouldn't have to be 
> repeating the gt->i915->drm whatever and gt->info.id on every line.
>
> You are free to try the most compact one and see how hard those 
> objections will be.
Um. I already did. This patch. And you are the only person to have 
objected in any manner at all.

John.

>
>>> Whether or not to put the gt as parameter to the helper macro or 
>>> outside wasn't really decided upon. Anyway the macro would be adding 
>>> the magic "gt%u: " prefix, drm device and all.
>>>
>>> Also the name GT_LOG (or case) is just for illustration, that part 
>>> wasn't really discussed.
>>>
>>> If agreeable this pattern could then be used to consolidate some 
>>> other macros that we have. Although apart from CT_DEBUG/ERROR I 
>>> don't know if we have any others.
>> By consolidating as what? I hope you aren't meaning something like 
>> 'drm_err(CT_DEBUG(GT_LOG("msg")))?'
>
> No, consolidate as in use the same pattern of directly calling drm 
> logging helpers with our own formatting macro which changes per 
> subsystem/component/directory/whatever.
>
> Regards,
>
> Tvrtko
Tvrtko Ursulin Nov. 9, 2022, 11:05 a.m. UTC | #7
On 08/11/2022 20:15, John Harrison wrote:
> On 11/8/2022 01:01, Tvrtko Ursulin wrote:
>> On 07/11/2022 19:14, John Harrison wrote:
>>> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>
>>>>>>> When trying to analyse bug reports from CI, customers, etc. it 
>>>>>>> can be
>>>>>>> difficult to work out exactly what is happening on which GT in a
>>>>>>> multi-GT system. So add GT oriented debug/error message wrappers. If
>>>>>>> used instead of the drm_ equivalents, you get the same output but 
>>>>>>> with
>>>>>>> a GT# prefix on it.
>>>>>>>
>>>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>>
>>>>>> The only downside to this is that we'll print "GT0: " even on 
>>>>>> single-GT devices. We could introduce a gt->info.name and print 
>>>>>> that, so we could have it different per-platform, but IMO it's not 
>>>>>> worth the effort.
>>>>>>
>>>>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>>>
>>>>>> I think it might be worth getting an ack from one of the 
>>>>>> maintainers to make sure we're all aligned on transitioning to 
>>>>>> these new logging macro for gt code.
>>>>>
>>>>> Idea is I think a very good one. First I would suggest 
>>>>> standardising to lowercase GT in logs because:
>>>>>
>>>>> $ grep "GT%" i915/ -r
>>>>> $ grep "gt%" i915/ -r
>>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>>>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize gt%d 
>>>>> sysfs root\n", gt->info.id);
>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>>> gt%u RC6 sysfs files (%pe)\n",
>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs 
>>>>> files (%pe)\n",
>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>>> gt%u RPS sysfs files (%pe)",
>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>>> gt%u punit_req_freq_mhz sysfs (%pe)",
>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle sysfs 
>>>>> files (%pe)",
>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u 
>>>>> media_perf_power_attrs sysfs (%pe)\n",
>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add 
>>>>> gt%u rps defaults (%pe)\n",
>>>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: intel_pcode_init 
>>>>> failed %d\n", id, ret);
>>>>> i915/i915_hwmon.c:              snprintf(ddat_gt->name, 
>>>>> sizeof(ddat_gt->name), "i915_gt%u", i);
>>>>>
>>>
>>> Just because there are 11 existing instances of one form doesn't mean 
>>> that the 275 instances that are waiting to be converted should be 
>>> done incorrectly. GT is an acronym and should be capitalised.
>>
>> Okay just make it consistent then.
>>
>>> Besides:
>>> grep -r "GT " i915 | grep '"'
>>> i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout 
>>> disabling GT waking\n");
>>> i915/vlv_suspend.c:                     "timeout waiting for GT wells 
>>> to go %s\n",
>>> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access while 
>>> GT waking disabled\n");
>>> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
>>> str_yes_no(gt->awake));
>>> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
>>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on %s)", 
>>> engine->name);
>>> i915/intel_uncore.c:                  "GT thread status wait timed 
>>> out\n");
>>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT 
>>> failed to idle: %d\n", ret);
>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>> idle: %d\n", ret);
>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>> idle: %d\n", ret);
>>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as 
>>> "multicast" or "replicated" registers:
>>> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d 
>>> C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency of 
>>> %uKHz\n",
>>> i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
>>> wedged!\n", engine->name);
>>> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
>>> i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
>>> changed, was %uHz, now %uHz!\n",
>>> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT pm 
>>> before test\n");
>>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n");
>>> i915/i915_sysfs.c:                       "failed to register GT sysfs 
>>> directory\n");
>>> i915/intel_uncore.h:     * of the basic non-engine GT registers 
>>> (referred to as "GSI" on
>>> i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
>>> platforms)?  If so, we'll
>>>
>>>
>>>
>>>>> Then there is a question of naming. Are we okay with GT_XXX or, do 
>>>>> we want intel_gt_, or something completely different. I don't have 
>>>>> a strong opinion at the moment so I'll add some more folks to Cc.
>>>>
>>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
>>> prefer just gt_err("msg") to keep it as close to the official drm_* 
>>> versions as possible. Print lines tend to be excessively long 
>>> already. Taking a 'gt' parameter instead of a '&gt->i915->drm' 
>>> parameter does help with that but it seems like calling the wrapper 
>>> intel_gt_* is shooting ourselves in the foot on that one. And GT_ERR 
>>> vs gt_err just comes down to the fact that it is a macro wrapper and 
>>> therefore is required to be in upper case.
>>>
>>>> There was a maintainer level mini-discussion on this topic which I 
>>>> will try to summarise.
>>>>
>>>> Main contention point was the maintenance cost and generally an 
>>>> undesirable pattern of needing to add many 
>>>> subsystem/component/directory specific macros. Which then typically 
>>>> need extra flavours and so on. But over verbosity of the 
>>> How many versions are you expecting to add? Beyond the tile instance, 
>>> what further addressing requirements are there? The card instance is 
>>> already printed as part of the PCI address. The only other reason to 
>>> add per component wrappers would be to wrap the mechanism for getting 
>>> from some random per component object back to the intel_gt structure. 
>>> But that is hardware a new issue being added by this wrapper. It is 
>>> also not a requirement. Much of the code has a gt pointer already. 
>>> For the parts that don't, some of it would be a trivial engine->gt 
>>> type dereference, some of it is a more complex container_of type 
>>> construction. But for those, the given file will already have 
>>> multiple instances of that already (usually as the first or second 
>>> line of the function - 'intel_gt *gt = fancy_access_method(my_obj)' 
>>> so adding one or two more of those as necessary is not making the 
>>> code harder to read.
>>>
>>>> code is obviously also bad, so one compromise idea was to add a 
>>>> macro which builds the GT string and use drm logging helpers 
>>>> directly. This would be something like:
>>>>
>>>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>>>  drm_info(GT_LOG(...same...));
>>> Seriously? As above, some of these lines are already way too long, 
>>> this version makes them even longer with no obvious benefit. Worse, 
>>> it makes it harder to read what is going on. It is much less 
>>> intuitive to read than just replacing the drm_err itself. And having 
>>> two sets of parenthesis with some parameters inside the first and 
>>> some only inside the second is really horrid! Also, putting the 'gt' 
>>> parameter in the middle just confuses it with the rest of the printf 
>>> arguments even though there is no %d in the string for it. So now a 
>>> quick glances tells you that your code is wrong because you have 
>>> three format specifiers but four parameters.
>>>
>>> Whereas, just replacing drm_err with gt_err (or GT_ERR or 
>>> intel_gt_err) keeps everything else consistent. The first parameter 
>>> changes from 'drm' to 'gt' but is still the master object parameter 
>>> and it matches the function/macro prefix so inherently looks correct. 
>>> Then you have your message plus parameters. No confusing orders, no 
>>> confusing parenthesis, no excessive macro levels, no confusion at 
>>> all. Just nice simple, easy to read, easy to maintain code.
>>
>> I am personally okay with gt_err/GT_ERR some other folks might object 
>> though. And I can also understand the argument why it is better to not 
>> have to define gt_err, gt_warn, gt_info, gt_notice, gt_debug, 
>> gt_err_ratelimited, gt_warn_once.. and instead have only one macro.
> A small set of trivial macro definitions vs a complicated and unreadable 
> construct on every single print? Erm, isn't that the very definition of 
> abstracting to helpers as generally required by every code review ever?
> 
> And what 'other folks might object'? People already CC'd? People outside 
> of i915?
> 
> 
>>
>> Because of that I was passing on to you the compromise option.
>>
>> It maybe still has net space savings since we wouldn't have to be 
>> repeating the gt->i915->drm whatever and gt->info.id on every line.
>>
>> You are free to try the most compact one and see how hard those 
>> objections will be.
> Um. I already did. This patch. And you are the only person to have 
> objected in any manner at all.

Where I have objected?

I was a) asking to convert all gt/ within one kernel release, b) 
transferring the maintainer discussion from IRC to this email chain to 
outlay one alternative, for which I said I could see the pros and cons 
of both, and c) raised the naming question early since that can usually 
become a churn point later on when we have large scale code transformations.

As said, FWIW you have my ack for GT_XXX naming and approach, but please 
do convert the whole of gt/ so we don't ship with a mish-mash of log 
messages.

Regards,

Tvrtko
John Harrison Nov. 9, 2022, 5:46 p.m. UTC | #8
On 11/9/2022 03:05, Tvrtko Ursulin wrote:
> On 08/11/2022 20:15, John Harrison wrote:
>> On 11/8/2022 01:01, Tvrtko Ursulin wrote:
>>> On 07/11/2022 19:14, John Harrison wrote:
>>>> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>>>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>>>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>
>>>>>>>> When trying to analyse bug reports from CI, customers, etc. it 
>>>>>>>> can be
>>>>>>>> difficult to work out exactly what is happening on which GT in a
>>>>>>>> multi-GT system. So add GT oriented debug/error message 
>>>>>>>> wrappers. If
>>>>>>>> used instead of the drm_ equivalents, you get the same output 
>>>>>>>> but with
>>>>>>>> a GT# prefix on it.
>>>>>>>>
>>>>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>
>>>>>>> The only downside to this is that we'll print "GT0: " even on 
>>>>>>> single-GT devices. We could introduce a gt->info.name and print 
>>>>>>> that, so we could have it different per-platform, but IMO it's 
>>>>>>> not worth the effort.
>>>>>>>
>>>>>>> Reviewed-by: Daniele Ceraolo Spurio 
>>>>>>> <daniele.ceraolospurio@intel.com>
>>>>>>>
>>>>>>> I think it might be worth getting an ack from one of the 
>>>>>>> maintainers to make sure we're all aligned on transitioning to 
>>>>>>> these new logging macro for gt code.
>>>>>>
>>>>>> Idea is I think a very good one. First I would suggest 
>>>>>> standardising to lowercase GT in logs because:
>>>>>>
>>>>>> $ grep "GT%" i915/ -r
>>>>>> $ grep "gt%" i915/ -r
>>>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>>>>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize 
>>>>>> gt%d sysfs root\n", gt->info.id);
>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>> create gt%u RC6 sysfs files (%pe)\n",
>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs 
>>>>>> files (%pe)\n",
>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>> create gt%u RPS sysfs files (%pe)",
>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>> create gt%u punit_req_freq_mhz sysfs (%pe)",
>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle 
>>>>>> sysfs files (%pe)",
>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u 
>>>>>> media_perf_power_attrs sysfs (%pe)\n",
>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add 
>>>>>> gt%u rps defaults (%pe)\n",
>>>>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: 
>>>>>> intel_pcode_init failed %d\n", id, ret);
>>>>>> i915/i915_hwmon.c: snprintf(ddat_gt->name, sizeof(ddat_gt->name), 
>>>>>> "i915_gt%u", i);
>>>>>>
>>>>
>>>> Just because there are 11 existing instances of one form doesn't 
>>>> mean that the 275 instances that are waiting to be converted should 
>>>> be done incorrectly. GT is an acronym and should be capitalised.
>>>
>>> Okay just make it consistent then.
>>>
>>>> Besides:
>>>> grep -r "GT " i915 | grep '"'
>>>> i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout 
>>>> disabling GT waking\n");
>>>> i915/vlv_suspend.c:                     "timeout waiting for GT 
>>>> wells to go %s\n",
>>>> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access 
>>>> while GT waking disabled\n");
>>>> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
>>>> str_yes_no(gt->awake));
>>>> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
>>>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on 
>>>> %s)", engine->name);
>>>> i915/intel_uncore.c:                  "GT thread status wait timed 
>>>> out\n");
>>>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT 
>>>> failed to idle: %d\n", ret);
>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>>> idle: %d\n", ret);
>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>>> idle: %d\n", ret);
>>>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as 
>>>> "multicast" or "replicated" registers:
>>>> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d 
>>>> C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency 
>>>> of %uKHz\n",
>>>> i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
>>>> wedged!\n", engine->name);
>>>> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
>>>> i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
>>>> changed, was %uHz, now %uHz!\n",
>>>> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT 
>>>> pm before test\n");
>>>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n");
>>>> i915/i915_sysfs.c:                       "failed to register GT 
>>>> sysfs directory\n");
>>>> i915/intel_uncore.h:     * of the basic non-engine GT registers 
>>>> (referred to as "GSI" on
>>>> i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
>>>> platforms)?  If so, we'll
>>>>
>>>>
>>>>
>>>>>> Then there is a question of naming. Are we okay with GT_XXX or, 
>>>>>> do we want intel_gt_, or something completely different. I don't 
>>>>>> have a strong opinion at the moment so I'll add some more folks 
>>>>>> to Cc.
>>>>>
>>>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
>>>> prefer just gt_err("msg") to keep it as close to the official drm_* 
>>>> versions as possible. Print lines tend to be excessively long 
>>>> already. Taking a 'gt' parameter instead of a '&gt->i915->drm' 
>>>> parameter does help with that but it seems like calling the wrapper 
>>>> intel_gt_* is shooting ourselves in the foot on that one. And 
>>>> GT_ERR vs gt_err just comes down to the fact that it is a macro 
>>>> wrapper and therefore is required to be in upper case.
>>>>
>>>>> There was a maintainer level mini-discussion on this topic which I 
>>>>> will try to summarise.
>>>>>
>>>>> Main contention point was the maintenance cost and generally an 
>>>>> undesirable pattern of needing to add many 
>>>>> subsystem/component/directory specific macros. Which then 
>>>>> typically need extra flavours and so on. But over verbosity of the 
>>>> How many versions are you expecting to add? Beyond the tile 
>>>> instance, what further addressing requirements are there? The card 
>>>> instance is already printed as part of the PCI address. The only 
>>>> other reason to add per component wrappers would be to wrap the 
>>>> mechanism for getting from some random per component object back to 
>>>> the intel_gt structure. But that is hardware a new issue being 
>>>> added by this wrapper. It is also not a requirement. Much of the 
>>>> code has a gt pointer already. For the parts that don't, some of it 
>>>> would be a trivial engine->gt type dereference, some of it is a 
>>>> more complex container_of type construction. But for those, the 
>>>> given file will already have multiple instances of that already 
>>>> (usually as the first or second line of the function - 'intel_gt 
>>>> *gt = fancy_access_method(my_obj)' so adding one or two more of 
>>>> those as necessary is not making the code harder to read.
>>>>
>>>>> code is obviously also bad, so one compromise idea was to add a 
>>>>> macro which builds the GT string and use drm logging helpers 
>>>>> directly. This would be something like:
>>>>>
>>>>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>>>>  drm_info(GT_LOG(...same...));
>>>> Seriously? As above, some of these lines are already way too long, 
>>>> this version makes them even longer with no obvious benefit. Worse, 
>>>> it makes it harder to read what is going on. It is much less 
>>>> intuitive to read than just replacing the drm_err itself. And 
>>>> having two sets of parenthesis with some parameters inside the 
>>>> first and some only inside the second is really horrid! Also, 
>>>> putting the 'gt' parameter in the middle just confuses it with the 
>>>> rest of the printf arguments even though there is no %d in the 
>>>> string for it. So now a quick glances tells you that your code is 
>>>> wrong because you have three format specifiers but four parameters.
>>>>
>>>> Whereas, just replacing drm_err with gt_err (or GT_ERR or 
>>>> intel_gt_err) keeps everything else consistent. The first parameter 
>>>> changes from 'drm' to 'gt' but is still the master object parameter 
>>>> and it matches the function/macro prefix so inherently looks 
>>>> correct. Then you have your message plus parameters. No confusing 
>>>> orders, no confusing parenthesis, no excessive macro levels, no 
>>>> confusion at all. Just nice simple, easy to read, easy to maintain 
>>>> code.
>>>
>>> I am personally okay with gt_err/GT_ERR some other folks might 
>>> object though. And I can also understand the argument why it is 
>>> better to not have to define gt_err, gt_warn, gt_info, gt_notice, 
>>> gt_debug, gt_err_ratelimited, gt_warn_once.. and instead have only 
>>> one macro.
>> A small set of trivial macro definitions vs a complicated and 
>> unreadable construct on every single print? Erm, isn't that the very 
>> definition of abstracting to helpers as generally required by every 
>> code review ever?
>>
>> And what 'other folks might object'? People already CC'd? People 
>> outside of i915?
>>
>>
>>>
>>> Because of that I was passing on to you the compromise option.
>>>
>>> It maybe still has net space savings since we wouldn't have to be 
>>> repeating the gt->i915->drm whatever and gt->info.id on every line.
>>>
>>> You are free to try the most compact one and see how hard those 
>>> objections will be.
>> Um. I already did. This patch. And you are the only person to have 
>> objected in any manner at all.
>
> Where I have objected?
Only in everything you have written. Or are you saying that actually I 
could have just taken the r-b from Daniele and merged it while 
completely ignoring everything you have said because you didn't say 
'NACK' at the top of your email?

>
> I was a) asking to convert all gt/ within one kernel release, b) 
> transferring the maintainer discussion from IRC to this email chain to 
> outlay one alternative, for which I said I could see the pros and cons 
> of both, and c) raised the naming question early since that can 
> usually become a churn point later on when we have large scale code 
> transformations.
>
> As said, FWIW you have my ack for GT_XXX naming and approach, but 
> please do convert the whole of gt/ so we don't ship with a mish-mash 
> of log messages.
>
That sounds like a quite a lot of objections to me - don't do it that 
way, do it this way; fine do it that way but expect lots of complaints 
when you do; oh all right, do it that way but re-write the entire driver 
all at once.

I really don't get the 'we must not ship with a mish-mash of log 
messages' argument. It's hardly like the current total mish-mash of 
DRM_DEBUG, DRM_DEBUG_DRIVER, pr_dbg, drm_dbg, ... is even remotely 
consistent. It's also not like the patch is massively changing the 
format of the output messages. So some prints get a useful extra three 
characters in advance of others. Does anyone care? Would anyone except 
us even notice?

Is it really a problem to merge this patch now to get the process 
started? And other sub-components get updated as and when people get the 
time to do them? You could maybe even help rather than posting 
completely conflicting patch sets that basically duplicate all the 
effort for no actual benefit.

John.


> Regards,
>
> Tvrtko
Michal Wajdeczko Nov. 9, 2022, 7:57 p.m. UTC | #9
On 09.11.2022 18:46, John Harrison wrote:
> On 11/9/2022 03:05, Tvrtko Ursulin wrote:
>> On 08/11/2022 20:15, John Harrison wrote:
>>> On 11/8/2022 01:01, Tvrtko Ursulin wrote:
>>>> On 07/11/2022 19:14, John Harrison wrote:
>>>>> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>>>>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>>>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>>
>>>>>>>>> When trying to analyse bug reports from CI, customers, etc. it
>>>>>>>>> can be
>>>>>>>>> difficult to work out exactly what is happening on which GT in a
>>>>>>>>> multi-GT system. So add GT oriented debug/error message
>>>>>>>>> wrappers. If
>>>>>>>>> used instead of the drm_ equivalents, you get the same output
>>>>>>>>> but with
>>>>>>>>> a GT# prefix on it.
>>>>>>>>>
>>>>>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>
>>>>>>>> The only downside to this is that we'll print "GT0: " even on
>>>>>>>> single-GT devices. We could introduce a gt->info.name and print
>>>>>>>> that, so we could have it different per-platform, but IMO it's
>>>>>>>> not worth the effort.
>>>>>>>>
>>>>>>>> Reviewed-by: Daniele Ceraolo Spurio
>>>>>>>> <daniele.ceraolospurio@intel.com>
>>>>>>>>
>>>>>>>> I think it might be worth getting an ack from one of the
>>>>>>>> maintainers to make sure we're all aligned on transitioning to
>>>>>>>> these new logging macro for gt code.
>>>>>>>
>>>>>>> Idea is I think a very good one. First I would suggest
>>>>>>> standardising to lowercase GT in logs because:
>>>>>>>
>>>>>>> $ grep "GT%" i915/ -r
>>>>>>> $ grep "gt%" i915/ -r
>>>>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>>>>>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize
>>>>>>> gt%d sysfs root\n", gt->info.id);
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to
>>>>>>> create gt%u RC6 sysfs files (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs
>>>>>>> files (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to
>>>>>>> create gt%u RPS sysfs files (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to
>>>>>>> create gt%u punit_req_freq_mhz sysfs (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle
>>>>>>> sysfs files (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u
>>>>>>> media_perf_power_attrs sysfs (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add
>>>>>>> gt%u rps defaults (%pe)\n",
>>>>>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d:
>>>>>>> intel_pcode_init failed %d\n", id, ret);
>>>>>>> i915/i915_hwmon.c: snprintf(ddat_gt->name, sizeof(ddat_gt->name),
>>>>>>> "i915_gt%u", i);
>>>>>>>
>>>>>
>>>>> Just because there are 11 existing instances of one form doesn't
>>>>> mean that the 275 instances that are waiting to be converted should
>>>>> be done incorrectly. GT is an acronym and should be capitalised.
>>>>
>>>> Okay just make it consistent then.
>>>>
>>>>> Besides:
>>>>> grep -r "GT " i915 | grep '"'
>>>>> i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout
>>>>> disabling GT waking\n");
>>>>> i915/vlv_suspend.c:                     "timeout waiting for GT
>>>>> wells to go %s\n",
>>>>> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access
>>>>> while GT waking disabled\n");
>>>>> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n",
>>>>> str_yes_no(gt->awake));
>>>>> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
>>>>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on
>>>>> %s)", engine->name);
>>>>> i915/intel_uncore.c:                  "GT thread status wait timed
>>>>> out\n");
>>>>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT
>>>>> failed to idle: %d\n", ret);
>>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to
>>>>> idle: %d\n", ret);
>>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to
>>>>> idle: %d\n", ret);
>>>>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as
>>>>> "multicast" or "replicated" registers:
>>>>> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d
>>>>> C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency
>>>>> of %uKHz\n",
>>>>> i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is
>>>>> wedged!\n", engine->name);
>>>>> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
>>>>> i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency
>>>>> changed, was %uHz, now %uHz!\n",
>>>>> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT
>>>>> pm before test\n");
>>>>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n");
>>>>> i915/i915_sysfs.c:                       "failed to register GT
>>>>> sysfs directory\n");
>>>>> i915/intel_uncore.h:     * of the basic non-engine GT registers
>>>>> (referred to as "GSI" on
>>>>> i915/intel_uncore.h:     * newer platforms, or "GT block" on older
>>>>> platforms)?  If so, we'll
>>>>>
>>>>>
>>>>>
>>>>>>> Then there is a question of naming. Are we okay with GT_XXX or,
>>>>>>> do we want intel_gt_, or something completely different. I don't
>>>>>>> have a strong opinion at the moment so I'll add some more folks
>>>>>>> to Cc.
>>>>>>
>>>>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would
>>>>> prefer just gt_err("msg") to keep it as close to the official drm_*
>>>>> versions as possible. Print lines tend to be excessively long
>>>>> already. Taking a 'gt' parameter instead of a '&gt->i915->drm'
>>>>> parameter does help with that but it seems like calling the wrapper
>>>>> intel_gt_* is shooting ourselves in the foot on that one. And
>>>>> GT_ERR vs gt_err just comes down to the fact that it is a macro
>>>>> wrapper and therefore is required to be in upper case.
>>>>>
>>>>>> There was a maintainer level mini-discussion on this topic which I
>>>>>> will try to summarise.
>>>>>>
>>>>>> Main contention point was the maintenance cost and generally an
>>>>>> undesirable pattern of needing to add many
>>>>>> subsystem/component/directory specific macros. Which then
>>>>>> typically need extra flavours and so on. But over verbosity of the 
>>>>> How many versions are you expecting to add? Beyond the tile
>>>>> instance, what further addressing requirements are there? The card
>>>>> instance is already printed as part of the PCI address. The only
>>>>> other reason to add per component wrappers would be to wrap the
>>>>> mechanism for getting from some random per component object back to
>>>>> the intel_gt structure. But that is hardware a new issue being
>>>>> added by this wrapper. It is also not a requirement. Much of the
>>>>> code has a gt pointer already. For the parts that don't, some of it
>>>>> would be a trivial engine->gt type dereference, some of it is a
>>>>> more complex container_of type construction. But for those, the
>>>>> given file will already have multiple instances of that already
>>>>> (usually as the first or second line of the function - 'intel_gt
>>>>> *gt = fancy_access_method(my_obj)' so adding one or two more of
>>>>> those as necessary is not making the code harder to read.
>>>>>
>>>>>> code is obviously also bad, so one compromise idea was to add a
>>>>>> macro which builds the GT string and use drm logging helpers
>>>>>> directly. This would be something like:
>>>>>>
>>>>>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>>>>>  drm_info(GT_LOG(...same...));
>>>>> Seriously? As above, some of these lines are already way too long,
>>>>> this version makes them even longer with no obvious benefit. Worse,
>>>>> it makes it harder to read what is going on. It is much less
>>>>> intuitive to read than just replacing the drm_err itself. And
>>>>> having two sets of parenthesis with some parameters inside the
>>>>> first and some only inside the second is really horrid! Also,
>>>>> putting the 'gt' parameter in the middle just confuses it with the
>>>>> rest of the printf arguments even though there is no %d in the
>>>>> string for it. So now a quick glances tells you that your code is
>>>>> wrong because you have three format specifiers but four parameters.
>>>>>
>>>>> Whereas, just replacing drm_err with gt_err (or GT_ERR or
>>>>> intel_gt_err) keeps everything else consistent. The first parameter
>>>>> changes from 'drm' to 'gt' but is still the master object parameter
>>>>> and it matches the function/macro prefix so inherently looks
>>>>> correct. Then you have your message plus parameters. No confusing
>>>>> orders, no confusing parenthesis, no excessive macro levels, no
>>>>> confusion at all. Just nice simple, easy to read, easy to maintain
>>>>> code.
>>>>
>>>> I am personally okay with gt_err/GT_ERR some other folks might
>>>> object though. And I can also understand the argument why it is
>>>> better to not have to define gt_err, gt_warn, gt_info, gt_notice,
>>>> gt_debug, gt_err_ratelimited, gt_warn_once.. and instead have only
>>>> one macro.
>>> A small set of trivial macro definitions vs a complicated and
>>> unreadable construct on every single print? Erm, isn't that the very
>>> definition of abstracting to helpers as generally required by every
>>> code review ever?
>>>
>>> And what 'other folks might object'? People already CC'd? People
>>> outside of i915?
>>>
>>>
>>>>
>>>> Because of that I was passing on to you the compromise option.
>>>>
>>>> It maybe still has net space savings since we wouldn't have to be
>>>> repeating the gt->i915->drm whatever and gt->info.id on every line.
>>>>
>>>> You are free to try the most compact one and see how hard those
>>>> objections will be.
>>> Um. I already did. This patch. And you are the only person to have
>>> objected in any manner at all.
>>
>> Where I have objected?
> Only in everything you have written. Or are you saying that actually I
> could have just taken the r-b from Daniele and merged it while
> completely ignoring everything you have said because you didn't say
> 'NACK' at the top of your email?
> 
>>
>> I was a) asking to convert all gt/ within one kernel release, b)
>> transferring the maintainer discussion from IRC to this email chain to
>> outlay one alternative, for which I said I could see the pros and cons
>> of both, and c) raised the naming question early since that can
>> usually become a churn point later on when we have large scale code
>> transformations.
>>
>> As said, FWIW you have my ack for GT_XXX naming and approach, but
>> please do convert the whole of gt/ so we don't ship with a mish-mash
>> of log messages.
>>
> That sounds like a quite a lot of objections to me - don't do it that
> way, do it this way; fine do it that way but expect lots of complaints
> when you do; oh all right, do it that way but re-write the entire driver
> all at once.
> 
> I really don't get the 'we must not ship with a mish-mash of log
> messages' argument. It's hardly like the current total mish-mash of
> DRM_DEBUG, DRM_DEBUG_DRIVER, pr_dbg, drm_dbg, ... is even remotely
> consistent. It's also not like the patch is massively changing the
> format of the output messages. So some prints get a useful extra three
> characters in advance of others. Does anyone care? Would anyone except
> us even notice?
> 
> Is it really a problem to merge this patch now to get the process
> started? And other sub-components get updated as and when people get the
> time to do them? You could maybe even help rather than posting
> completely conflicting patch sets that basically duplicate all the
> effort for no actual benefit.

Instead of merging this patch now, oriented on GT only, I would rather
wait until we discuss and plan solution for the all sub-components.

Once that's done (with agreement on naming and output) we can start
converting exiting messages.

My proposal would be:
 - use wrappers per component
 - use lower case names
 - don't add colon

#define i915_xxx(_i915, _fmt, ...) \
	drm_xxx(&(_i915)->drm, _fmt, ##__VA_ARGS__)

#define gt_xxx(_gt, _fmt, ...) \
	i915_xxx((_gt)->i915, "GT%u " _fmt, (_gt)->info.id, ..

#define guc_xxx(_guc, _fmt, ...) \
	gt_xxx(guc_to_gt(_guc), "GuC " _fmt, ..

#define ct_xxx(_ct, _fmt, ...) \
	guc_xxx(ct_to_guc(_ct), "CTB " _fmt, ..

where
	xxx = { err, warn, notice, info, dbg }

and then for calls like:

	i915_err(i915, "Foo failed (%pe)\n", ERR_PTR(err));
	  gt_err(gt,   "Foo failed (%pe)\n", ERR_PTR(err));
	 guc_err(guc,  "Foo failed (%pe)\n", ERR_PTR(err));
	  ct_err(ct,   "Foo failed (%pe)\n", ERR_PTR(err));

the output would look like:

[ ] i915 0000:00:02.0: [drm] *ERROR* Foo failed (-EIO)
[ ] i915 0000:00:02.0: [drm] *ERROR* GT0 Foo failed (-EIO)
[ ] i915 0000:00:02.0: [drm] *ERROR* GT0 GuC Foo failed (-EIO)
[ ] i915 0000:00:02.0: [drm] *ERROR* GT0 GuC CTB Foo failed (-EIO)

Would that work for all of you?

Michal

> 
> John.
> 
> 
>> Regards,
>>
>> Tvrtko
>
Tvrtko Ursulin Nov. 10, 2022, 9:43 a.m. UTC | #10
On 09/11/2022 17:46, John Harrison wrote:
> On 11/9/2022 03:05, Tvrtko Ursulin wrote:
>> On 08/11/2022 20:15, John Harrison wrote:
>>> On 11/8/2022 01:01, Tvrtko Ursulin wrote:
>>>> On 07/11/2022 19:14, John Harrison wrote:
>>>>> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>>>>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>>>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>>
>>>>>>>>> When trying to analyse bug reports from CI, customers, etc. it 
>>>>>>>>> can be
>>>>>>>>> difficult to work out exactly what is happening on which GT in a
>>>>>>>>> multi-GT system. So add GT oriented debug/error message 
>>>>>>>>> wrappers. If
>>>>>>>>> used instead of the drm_ equivalents, you get the same output 
>>>>>>>>> but with
>>>>>>>>> a GT# prefix on it.
>>>>>>>>>
>>>>>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>
>>>>>>>> The only downside to this is that we'll print "GT0: " even on 
>>>>>>>> single-GT devices. We could introduce a gt->info.name and print 
>>>>>>>> that, so we could have it different per-platform, but IMO it's 
>>>>>>>> not worth the effort.
>>>>>>>>
>>>>>>>> Reviewed-by: Daniele Ceraolo Spurio 
>>>>>>>> <daniele.ceraolospurio@intel.com>
>>>>>>>>
>>>>>>>> I think it might be worth getting an ack from one of the 
>>>>>>>> maintainers to make sure we're all aligned on transitioning to 
>>>>>>>> these new logging macro for gt code.
>>>>>>>
>>>>>>> Idea is I think a very good one. First I would suggest 
>>>>>>> standardising to lowercase GT in logs because:
>>>>>>>
>>>>>>> $ grep "GT%" i915/ -r
>>>>>>> $ grep "gt%" i915/ -r
>>>>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>>>>>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize 
>>>>>>> gt%d sysfs root\n", gt->info.id);
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>>> create gt%u RC6 sysfs files (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs 
>>>>>>> files (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>>> create gt%u RPS sysfs files (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to 
>>>>>>> create gt%u punit_req_freq_mhz sysfs (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle 
>>>>>>> sysfs files (%pe)",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u 
>>>>>>> media_perf_power_attrs sysfs (%pe)\n",
>>>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add 
>>>>>>> gt%u rps defaults (%pe)\n",
>>>>>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: 
>>>>>>> intel_pcode_init failed %d\n", id, ret);
>>>>>>> i915/i915_hwmon.c: snprintf(ddat_gt->name, sizeof(ddat_gt->name), 
>>>>>>> "i915_gt%u", i);
>>>>>>>
>>>>>
>>>>> Just because there are 11 existing instances of one form doesn't 
>>>>> mean that the 275 instances that are waiting to be converted should 
>>>>> be done incorrectly. GT is an acronym and should be capitalised.
>>>>
>>>> Okay just make it consistent then.
>>>>
>>>>> Besides:
>>>>> grep -r "GT " i915 | grep '"'
>>>>> i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout 
>>>>> disabling GT waking\n");
>>>>> i915/vlv_suspend.c:                     "timeout waiting for GT 
>>>>> wells to go %s\n",
>>>>> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access 
>>>>> while GT waking disabled\n");
>>>>> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
>>>>> str_yes_no(gt->awake));
>>>>> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
>>>>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on 
>>>>> %s)", engine->name);
>>>>> i915/intel_uncore.c:                  "GT thread status wait timed 
>>>>> out\n");
>>>>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT 
>>>>> failed to idle: %d\n", ret);
>>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>>>> idle: %d\n", ret);
>>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>>>> idle: %d\n", ret);
>>>>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as 
>>>>> "multicast" or "replicated" registers:
>>>>> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d 
>>>>> C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency 
>>>>> of %uKHz\n",
>>>>> i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
>>>>> wedged!\n", engine->name);
>>>>> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
>>>>> i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
>>>>> changed, was %uHz, now %uHz!\n",
>>>>> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT 
>>>>> pm before test\n");
>>>>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n");
>>>>> i915/i915_sysfs.c:                       "failed to register GT 
>>>>> sysfs directory\n");
>>>>> i915/intel_uncore.h:     * of the basic non-engine GT registers 
>>>>> (referred to as "GSI" on
>>>>> i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
>>>>> platforms)?  If so, we'll
>>>>>
>>>>>
>>>>>
>>>>>>> Then there is a question of naming. Are we okay with GT_XXX or, 
>>>>>>> do we want intel_gt_, or something completely different. I don't 
>>>>>>> have a strong opinion at the moment so I'll add some more folks 
>>>>>>> to Cc.
>>>>>>
>>>>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
>>>>> prefer just gt_err("msg") to keep it as close to the official drm_* 
>>>>> versions as possible. Print lines tend to be excessively long 
>>>>> already. Taking a 'gt' parameter instead of a '&gt->i915->drm' 
>>>>> parameter does help with that but it seems like calling the wrapper 
>>>>> intel_gt_* is shooting ourselves in the foot on that one. And 
>>>>> GT_ERR vs gt_err just comes down to the fact that it is a macro 
>>>>> wrapper and therefore is required to be in upper case.
>>>>>
>>>>>> There was a maintainer level mini-discussion on this topic which I 
>>>>>> will try to summarise.
>>>>>>
>>>>>> Main contention point was the maintenance cost and generally an 
>>>>>> undesirable pattern of needing to add many 
>>>>>> subsystem/component/directory specific macros. Which then 
>>>>>> typically need extra flavours and so on. But over verbosity of the 
>>>>> How many versions are you expecting to add? Beyond the tile 
>>>>> instance, what further addressing requirements are there? The card 
>>>>> instance is already printed as part of the PCI address. The only 
>>>>> other reason to add per component wrappers would be to wrap the 
>>>>> mechanism for getting from some random per component object back to 
>>>>> the intel_gt structure. But that is hardware a new issue being 
>>>>> added by this wrapper. It is also not a requirement. Much of the 
>>>>> code has a gt pointer already. For the parts that don't, some of it 
>>>>> would be a trivial engine->gt type dereference, some of it is a 
>>>>> more complex container_of type construction. But for those, the 
>>>>> given file will already have multiple instances of that already 
>>>>> (usually as the first or second line of the function - 'intel_gt 
>>>>> *gt = fancy_access_method(my_obj)' so adding one or two more of 
>>>>> those as necessary is not making the code harder to read.
>>>>>
>>>>>> code is obviously also bad, so one compromise idea was to add a 
>>>>>> macro which builds the GT string and use drm logging helpers 
>>>>>> directly. This would be something like:
>>>>>>
>>>>>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>>>>>  drm_info(GT_LOG(...same...));
>>>>> Seriously? As above, some of these lines are already way too long, 
>>>>> this version makes them even longer with no obvious benefit. Worse, 
>>>>> it makes it harder to read what is going on. It is much less 
>>>>> intuitive to read than just replacing the drm_err itself. And 
>>>>> having two sets of parenthesis with some parameters inside the 
>>>>> first and some only inside the second is really horrid! Also, 
>>>>> putting the 'gt' parameter in the middle just confuses it with the 
>>>>> rest of the printf arguments even though there is no %d in the 
>>>>> string for it. So now a quick glances tells you that your code is 
>>>>> wrong because you have three format specifiers but four parameters.
>>>>>
>>>>> Whereas, just replacing drm_err with gt_err (or GT_ERR or 
>>>>> intel_gt_err) keeps everything else consistent. The first parameter 
>>>>> changes from 'drm' to 'gt' but is still the master object parameter 
>>>>> and it matches the function/macro prefix so inherently looks 
>>>>> correct. Then you have your message plus parameters. No confusing 
>>>>> orders, no confusing parenthesis, no excessive macro levels, no 
>>>>> confusion at all. Just nice simple, easy to read, easy to maintain 
>>>>> code.
>>>>
>>>> I am personally okay with gt_err/GT_ERR some other folks might 
>>>> object though. And I can also understand the argument why it is 
>>>> better to not have to define gt_err, gt_warn, gt_info, gt_notice, 
>>>> gt_debug, gt_err_ratelimited, gt_warn_once.. and instead have only 
>>>> one macro.
>>> A small set of trivial macro definitions vs a complicated and 
>>> unreadable construct on every single print? Erm, isn't that the very 
>>> definition of abstracting to helpers as generally required by every 
>>> code review ever?
>>>
>>> And what 'other folks might object'? People already CC'd? People 
>>> outside of i915?
>>>
>>>
>>>>
>>>> Because of that I was passing on to you the compromise option.
>>>>
>>>> It maybe still has net space savings since we wouldn't have to be 
>>>> repeating the gt->i915->drm whatever and gt->info.id on every line.
>>>>
>>>> You are free to try the most compact one and see how hard those 
>>>> objections will be.
>>> Um. I already did. This patch. And you are the only person to have 
>>> objected in any manner at all.
>>
>> Where I have objected?
> Only in everything you have written. Or are you saying that actually I 

???

> could have just taken the r-b from Daniele and merged it while 
> completely ignoring everything you have said because you didn't say 
> 'NACK' at the top of your email?

No, but because we don't do that while there is an ongoing discussion.

>> I was a) asking to convert all gt/ within one kernel release, b) 
>> transferring the maintainer discussion from IRC to this email chain to 
>> outlay one alternative, for which I said I could see the pros and cons 
>> of both, and c) raised the naming question early since that can 
>> usually become a churn point later on when we have large scale code 
>> transformations.
>>
>> As said, FWIW you have my ack for GT_XXX naming and approach, but 
>> please do convert the whole of gt/ so we don't ship with a mish-mash 
>> of log messages.
>>
> That sounds like a quite a lot of objections to me - don't do it that 
> way, do it this way; fine do it that way but expect lots of complaints 
> when you do; oh all right, do it that way but re-write the entire driver 
> all at once.

No, I'll repeat what I said one last time.

I said it's good idea, raised the topic of naming, since it is good to 
get consensus before starting a wide renaming exercise. I also chatted 
with other maintainers and passed on here one alternative proposed, with 
its pros and cons. Finally I said your initial approach has my ack at 
least, even though not everyone liked it. And I asked to convert all 
gt/, since I think it is a good idea, and frankly it's not a very big 
ask. It is just how it works.

> I really don't get the 'we must not ship with a mish-mash of log 
> messages' argument. It's hardly like the current total mish-mash of 
> DRM_DEBUG, DRM_DEBUG_DRIVER, pr_dbg, drm_dbg, ... is even remotely 
> consistent. It's also not like the patch is massively changing the 
> format of the output messages. So some prints get a useful extra three 
> characters in advance of others. Does anyone care? Would anyone except 
> us even notice?
> 
> Is it really a problem to merge this patch now to get the process 
> started? And other sub-components get updated as and when people get the 
> time to do them? You could maybe even help rather than posting 

It can be a problem. You opened up a topic which leads to an overall 
improvement and if we merge churn before we decide on the final 
direction, then we just get more churn later.

> completely conflicting patch sets that basically duplicate all the 
> effort for no actual benefit.

Completely conflicting?

Yours:

  drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 25 +++--
  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  9 +-
  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 50 ++++------
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  9 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c     | 17 ++--
  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 49 +++++-----
  drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c     |  3 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |  6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 ++++++------
  drivers/gpu/drm/i915/gt/uc/intel_huc.c        | 20 ++--
  drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 84 ++++++++---------
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 91 +++++++++----------
  drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 36 ++++----
  .../drm/i915/gt/uc/selftest_guc_hangcheck.c   | 22 ++---
  .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   | 10 +-
  15 files changed, 228 insertions(+), 259 deletions(-)

Mine:

  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  2 +-
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 23 ++++++----
  .../drm/i915/gt/intel_execlists_submission.c  | 13 +++---
  drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  |  4 +-
  drivers/gpu/drm/i915/gt/intel_gt.c            |  4 +-
  drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  8 ++--
  drivers/gpu/drm/i915/gt/intel_rps.c           |  6 ++-
  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 43 +++++++++++--------
  .../gpu/drm/i915/gt/intel_workarounds_types.h |  4 ++
  .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
  drivers/gpu/drm/i915/i915_debugfs.c           |  4 +-
  drivers/gpu/drm/i915/i915_gem.c               |  2 +-
  drivers/gpu/drm/i915/i915_getparam.c          |  2 +-
  drivers/gpu/drm/i915/i915_irq.c               | 12 +++---
  drivers/gpu/drm/i915/i915_perf.c              | 14 +++---
  drivers/gpu/drm/i915/i915_query.c             | 12 +++---
  drivers/gpu/drm/i915/i915_sysfs.c             |  3 +-
  drivers/gpu/drm/i915/i915_vma.c               | 16 ++++---
  drivers/gpu/drm/i915/intel_uncore.c           | 21 +++++----
  19 files changed, 116 insertions(+), 81 deletions(-)

I deliberately did not touch gt/uc so they don't touch a single same file!

Unless you are referring to the fact gt logging will need to touch some 
of the same lines my patch touches. Well yes, some, not all, and I 
reckoned it will be much quicker since there aren't really any 
contentious points with DRM_XXX -> drm_xxx. Plus I added struct intel_gt 
or whatever or it wasn't before so it would make your patch easier/smaller.

Regards,

Tvrtko
Tvrtko Ursulin Nov. 10, 2022, 9:55 a.m. UTC | #11
On 09/11/2022 19:57, Michal Wajdeczko wrote:

[snip]

>> Is it really a problem to merge this patch now to get the process
>> started? And other sub-components get updated as and when people get the
>> time to do them? You could maybe even help rather than posting
>> completely conflicting patch sets that basically duplicate all the
>> effort for no actual benefit.
> 
> Instead of merging this patch now, oriented on GT only, I would rather
> wait until we discuss and plan solution for the all sub-components.

Yes, agreed.

> Once that's done (with agreement on naming and output) we can start
> converting exiting messages.
> 
> My proposal would be:
>   - use wrappers per component

This is passable to me but Jani has raised a concern on IRC that it 
leads to a lot of macro duplication. Which is I think a valid point, but 
which does not have a completely nice solution. Best I heard so far was 
a suggestion from Joonas to add just a single component formatter macro 
and use the existing drm_xxx helpers.

>   - use lower case names

I prefer this as well. Even though usual argument is for macros to be 
upper case, I find the improved readability of lower case trumps that.

>   - don't add colon

Not sure, when I look at it below it looks a bit not structured enough 
without the colon, but maybe it is just me.

> #define i915_xxx(_i915, _fmt, ...) \
> 	drm_xxx(&(_i915)->drm, _fmt, ##__VA_ARGS__)
> 
> #define gt_xxx(_gt, _fmt, ...) \
> 	i915_xxx((_gt)->i915, "GT%u " _fmt, (_gt)->info.id, ..
> 
> #define guc_xxx(_guc, _fmt, ...) \
> 	gt_xxx(guc_to_gt(_guc), "GuC " _fmt, ..
> 
> #define ct_xxx(_ct, _fmt, ...) \
> 	guc_xxx(ct_to_guc(_ct), "CTB " _fmt, ..
> 
> where
> 	xxx = { err, warn, notice, info, dbg }
> 
> and then for calls like:
> 
> 	i915_err(i915, "Foo failed (%pe)\n", ERR_PTR(err));
> 	  gt_err(gt,   "Foo failed (%pe)\n", ERR_PTR(err));
> 	 guc_err(guc,  "Foo failed (%pe)\n", ERR_PTR(err));
> 	  ct_err(ct,   "Foo failed (%pe)\n", ERR_PTR(err));

So the macro idea would be like this:

   drm_err(I915_LOG("Foo failed (%pe)\n", i915), ERR_PTR(err));
   drm_err(GT_LOG("Foo failed (%pe)\n", gt), ERR_PTR(err));
   drm_err(GUC_LOG("Foo failed (%pe)\n", guc), ERR_PTR(err));
   drm_err(CT_LOG("Foo failed (%pe)\n", ct), ERR_PTR(err));

Each component would just need to define a single macro and not have to 
duplicate all the err, info, warn, notice, ratelimited, once, whatever 
versions. Which is a benefit but it's a quite a bit uglier to read in 
the code.

Perhaps macro could be called something other than XX_LOG to make it 
more readable, don't know.

Regards,

Tvrtko
Jani Nikula Nov. 10, 2022, 10:33 a.m. UTC | #12
On Wed, 09 Nov 2022, Michal Wajdeczko <michal.wajdeczko@intel.com> wrote:
> Instead of merging this patch now, oriented on GT only, I would rather
> wait until we discuss and plan solution for the all sub-components.
>
> Once that's done (with agreement on naming and output) we can start
> converting exiting messages.
>
> My proposal would be:
>  - use wrappers per component
>  - use lower case names
>  - don't add colon
>
> #define i915_xxx(_i915, _fmt, ...) \
> 	drm_xxx(&(_i915)->drm, _fmt, ##__VA_ARGS__)
>
> #define gt_xxx(_gt, _fmt, ...) \
> 	i915_xxx((_gt)->i915, "GT%u " _fmt, (_gt)->info.id, ..
>
> #define guc_xxx(_guc, _fmt, ...) \
> 	gt_xxx(guc_to_gt(_guc), "GuC " _fmt, ..
>
> #define ct_xxx(_ct, _fmt, ...) \
> 	guc_xxx(ct_to_guc(_ct), "CTB " _fmt, ..
>
> where
> 	xxx = { err, warn, notice, info, dbg }
>
> and then for calls like:
>
> 	i915_err(i915, "Foo failed (%pe)\n", ERR_PTR(err));
> 	  gt_err(gt,   "Foo failed (%pe)\n", ERR_PTR(err));
> 	 guc_err(guc,  "Foo failed (%pe)\n", ERR_PTR(err));
> 	  ct_err(ct,   "Foo failed (%pe)\n", ERR_PTR(err));
>
> the output would look like:
>
> [ ] i915 0000:00:02.0: [drm] *ERROR* Foo failed (-EIO)
> [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 Foo failed (-EIO)
> [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 GuC Foo failed (-EIO)
> [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 GuC CTB Foo failed (-EIO)
>
> Would that work for all of you?

I think we borderline have too many wrappers already. I've basically
blocked the i915_* variants early on in favour of just using the drm_*
calls directly - which, themselves, are also wrappers on dev_*.

In general, there's aversion in the kernel to adding wrappers that hide
what's actually going on. WYSIWYG.

So I wasn't thrilled to see the GT_* wrappers either.

And indeed part of it is that anything you do sets an example for the
next developer coming along. You add wrappers for this, and sure enough
someone's going to cargo cult and add wrappers for some other
component. We didn't even have to merge this to see that idea pop up.

The thing with logging is that it's not just one wrapper. It's info,
notice, warn, err, multiplied by the "once" variants, with some
duplicated by the "ratelimited" variants. (And yeah, C sucks at this.)

Arguably the display code could use logging wrappers for connectors,
encoders and crtcs, too. Not just i915, but across drm. See:

git grep "\[\(CONNECTOR\|ENCODER\|CRTC\)"

But we've opted not to.

This is just my educated guess, but what I think is more likely to
happen is adding a "debug name" member to the relevant structs that you
could use like:

	drm_dbg_kms(connector->dev, "%s foo\n", connector->dbg_id);

---

I'm not going to block this patchset, but I also really *really* don't
want to see a cargo culted copy of this for other components. Which kind
of makes me anxious.

So what if you looked again what you could do to make using the drm_*
logging variants easier?


BR,
Jani.
Michal Wajdeczko Nov. 10, 2022, 10:35 a.m. UTC | #13
On 10.11.2022 10:55, Tvrtko Ursulin wrote:
> 
> On 09/11/2022 19:57, Michal Wajdeczko wrote:
> 
> [snip]
> 
>>> Is it really a problem to merge this patch now to get the process
>>> started? And other sub-components get updated as and when people get the
>>> time to do them? You could maybe even help rather than posting
>>> completely conflicting patch sets that basically duplicate all the
>>> effort for no actual benefit.
>>
>> Instead of merging this patch now, oriented on GT only, I would rather
>> wait until we discuss and plan solution for the all sub-components.
> 
> Yes, agreed.
> 
>> Once that's done (with agreement on naming and output) we can start
>> converting exiting messages.
>>
>> My proposal would be:
>>   - use wrappers per component
> 
> This is passable to me but Jani has raised a concern on IRC that it
> leads to a lot of macro duplication. Which is I think a valid point, but
> which does not have a completely nice solution. Best I heard so far was
> a suggestion from Joonas to add just a single component formatter macro
> and use the existing drm_xxx helpers.
> 
>>   - use lower case names
> 
> I prefer this as well. Even though usual argument is for macros to be
> upper case, I find the improved readability of lower case trumps that.
> 
>>   - don't add colon
> 
> Not sure, when I look at it below it looks a bit not structured enough
> without the colon, but maybe it is just me.
> 
>> #define i915_xxx(_i915, _fmt, ...) \
>>     drm_xxx(&(_i915)->drm, _fmt, ##__VA_ARGS__)
>>
>> #define gt_xxx(_gt, _fmt, ...) \
>>     i915_xxx((_gt)->i915, "GT%u " _fmt, (_gt)->info.id, ..
>>
>> #define guc_xxx(_guc, _fmt, ...) \
>>     gt_xxx(guc_to_gt(_guc), "GuC " _fmt, ..
>>
>> #define ct_xxx(_ct, _fmt, ...) \
>>     guc_xxx(ct_to_guc(_ct), "CTB " _fmt, ..
>>
>> where
>>     xxx = { err, warn, notice, info, dbg }
>>
>> and then for calls like:
>>
>>     i915_err(i915, "Foo failed (%pe)\n", ERR_PTR(err));
>>       gt_err(gt,   "Foo failed (%pe)\n", ERR_PTR(err));
>>      guc_err(guc,  "Foo failed (%pe)\n", ERR_PTR(err));
>>       ct_err(ct,   "Foo failed (%pe)\n", ERR_PTR(err));
> 
> So the macro idea would be like this:
> 
>   drm_err(I915_LOG("Foo failed (%pe)\n", i915), ERR_PTR(err));
>   drm_err(GT_LOG("Foo failed (%pe)\n", gt), ERR_PTR(err));
>   drm_err(GUC_LOG("Foo failed (%pe)\n", guc), ERR_PTR(err));
>   drm_err(CT_LOG("Foo failed (%pe)\n", ct), ERR_PTR(err));
> 
> Each component would just need to define a single macro and not have to
> duplicate all the err, info, warn, notice, ratelimited, once, whatever
> versions. Which is a benefit but it's a quite a bit uglier to read in
> the code.

If there is a choice between having ugly code all over the place and few
more lines with helpers then without any doubts I would pick the latter.

And this seems to be option already used elsewhere, see:

#define dev_err(dev, fmt, ...) \
	dev_printk_index_wrap ...

#define pci_err(pdev, fmt, arg...) \
	dev_err(&(pdev)->dev, fmt, ##arg)

#define drm_err(drm, fmt, ...) \
	__drm_printk((drm), err,, "*ERROR* " fmt, ##__VA_ARGS__)

#define drbd_err(obj, fmt, args...) \
	drbd_printk(KERN_ERR, obj, fmt, ## args)

#define ch7006_err(client, format, ...) \
	dev_err(&client->dev, format, __VA_ARGS__)

#define mthca_err(mdev, format, arg...) \
	dev_err(&mdev->pdev->dev, format, ## arg)

#define ctx_err(ctx, fmt, arg...) \
	cal_err((ctx)->cal, "ctx%u: " fmt, (ctx)->dma_ctx, ##arg)

#define mlx4_err(mdev, format, ...) \
	dev_err(&(mdev)->persist->pdev->dev, format, ##__VA_ARGS__)

...

Michal


[1]
https://elixir.bootlin.com/linux/v6.1-rc4/source/include/linux/dev_printk.h#L143

[2]
https://elixir.bootlin.com/linux/v6.1-rc4/source/include/linux/pci.h#L2485

[3]
https://elixir.bootlin.com/linux/v6.1-rc4/source/include/drm/drm_print.h#L468

[4]
https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/block/drbd/drbd_int.h#L113

[5]
https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/gpu/drm/i2c/ch7006_priv.h#L139

[6]
https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/infiniband/hw/mthca/mthca_dev.h#L377

[7]
https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/media/platform/ti/cal/cal.h#L279

[8]
https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/net/ethernet/mellanox/mlx4/mlx4.h#L225

> 
> Perhaps macro could be called something other than XX_LOG to make it
> more readable, don't know.
> 
> Regards,
> 
> Tvrtko
John Harrison Nov. 14, 2022, 10:10 p.m. UTC | #14
On 11/10/2022 02:33, Jani Nikula wrote:
> On Wed, 09 Nov 2022, Michal Wajdeczko <michal.wajdeczko@intel.com> wrote:
>> Instead of merging this patch now, oriented on GT only, I would rather
>> wait until we discuss and plan solution for the all sub-components.
>>
>> Once that's done (with agreement on naming and output) we can start
>> converting exiting messages.
>>
>> My proposal would be:
>>   - use wrappers per component
>>   - use lower case names
>>   - don't add colon
>>
>> #define i915_xxx(_i915, _fmt, ...) \
>> 	drm_xxx(&(_i915)->drm, _fmt, ##__VA_ARGS__)
>>
>> #define gt_xxx(_gt, _fmt, ...) \
>> 	i915_xxx((_gt)->i915, "GT%u " _fmt, (_gt)->info.id, ..
>>
>> #define guc_xxx(_guc, _fmt, ...) \
>> 	gt_xxx(guc_to_gt(_guc), "GuC " _fmt, ..
>>
>> #define ct_xxx(_ct, _fmt, ...) \
>> 	guc_xxx(ct_to_guc(_ct), "CTB " _fmt, ..
>>
>> where
>> 	xxx = { err, warn, notice, info, dbg }
>>
>> and then for calls like:
>>
>> 	i915_err(i915, "Foo failed (%pe)\n", ERR_PTR(err));
>> 	  gt_err(gt,   "Foo failed (%pe)\n", ERR_PTR(err));
>> 	 guc_err(guc,  "Foo failed (%pe)\n", ERR_PTR(err));
>> 	  ct_err(ct,   "Foo failed (%pe)\n", ERR_PTR(err));
>>
>> the output would look like:
>>
>> [ ] i915 0000:00:02.0: [drm] *ERROR* Foo failed (-EIO)
>> [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 Foo failed (-EIO)
>> [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 GuC Foo failed (-EIO)
>> [ ] i915 0000:00:02.0: [drm] *ERROR* GT0 GuC CTB Foo failed (-EIO)
>>
>> Would that work for all of you?
> I think we borderline have too many wrappers already. I've basically
> blocked the i915_* variants early on in favour of just using the drm_*
> calls directly - which, themselves, are also wrappers on dev_*.
>
> In general, there's aversion in the kernel to adding wrappers that hide
> what's actually going on. WYSIWYG.
That is blatantly not the case. There is a very strong push in kernel 
code reviews to abstract common code into helper functions. How is this 
any different? If you removed all helper functions with the desire to 
have truly WYSIWYG code then each line of code would be a paragraph long.

>
> So I wasn't thrilled to see the GT_* wrappers either.
>
> And indeed part of it is that anything you do sets an example for the
> next developer coming along. You add wrappers for this, and sure enough
> someone's going to cargo cult and add wrappers for some other
> component. We didn't even have to merge this to see that idea pop up.
>
> The thing with logging is that it's not just one wrapper. It's info,
> notice, warn, err, multiplied by the "once" variants, with some
> duplicated by the "ratelimited" variants. (And yeah, C sucks at this.)
>
> Arguably the display code could use logging wrappers for connectors,
> encoders and crtcs, too. Not just i915, but across drm. See:
>
> git grep "\[\(CONNECTOR\|ENCODER\|CRTC\)"
>
> But we've opted not to.
>
> This is just my educated guess, but what I think is more likely to
> happen is adding a "debug name" member to the relevant structs that you
> could use like:
>
> 	drm_dbg_kms(connector->dev, "%s foo\n", connector->dbg_id);
Huh? You want the wrapper to be called drm_dbg_gt instead of gt_dbg but 
otherwise it is identical to what is currently in the patch? Other than 
being a longer name, how is that any different? It is still very much a 
wrapper of exactly the type that you don't seem to like? And as noted 
already, in that case shouldn't we be pushing a patch to rename drm_dbg 
to dev_dbg_drm first? In which case the above should be dev_dbg_drm_kms?


>
> ---
>
> I'm not going to block this patchset, but I also really *really* don't
> want to see a cargo culted copy of this for other components. Which kind
> of makes me anxious.
I really don't understand the problem with adding such wrappers. Sure, 
there are a bunch of varients - dbg, err, warn, etc. But the variant 
list is contained and unlikely to grow at any point. It is trivially 
easy to look up gt_dbg to find out what it is doing. It makes the code 
much easier to read. I mean, who cares that gt_dbg is implemented via 
drm_dbg which is implemented via dev_dbg? Only someone that is actively 
wanting to further change the implementation. Anyone who is using it 
just wants to know that they are calling the correct debug printer for 
their module and are giving it the correct parameters. Having a nice 
simple name makes that totally clear.


>
> So what if you looked again what you could do to make using the drm_*
> logging variants easier?
Not a lot. All other proposals so far have been adding complicated and 
unwieldy constructs to every single print.

Fundamentally, the drm_dbg wrappers as they stand do not allow for 
further customisation. So your choice is to replace, rewrite or wrap. 
Replacing it with an entirely parallel implementation is fundamentally 
the wrong direction. Re-writing the DRM layer to take a pointer to a 
custom printer object instead of a 'struct drm_device' (where each 
sub-module defines it's own private printer that adds it's preferred 
string and chains on to the next printer) seems like a huge change to 
the entire DRM code base for very little benefit. The option of wrapping 
the just arguments makes for messy code. The option of wrapping the 
function call itself means wrapping five functions. Pick your poison.

John.


>
>
> BR,
> Jani.
>
>
>
John Harrison Nov. 14, 2022, 10:14 p.m. UTC | #15
On 11/10/2022 01:43, Tvrtko Ursulin wrote:
> On 09/11/2022 17:46, John Harrison wrote:
>> On 11/9/2022 03:05, Tvrtko Ursulin wrote:
>>> On 08/11/2022 20:15, John Harrison wrote:
>>>> On 11/8/2022 01:01, Tvrtko Ursulin wrote:
>>>>> On 07/11/2022 19:14, John Harrison wrote:
>>>>>> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>>>>>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>>>>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>>>>>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>>>
>>>>>>>>>> When trying to analyse bug reports from CI, customers, etc. 
>>>>>>>>>> it can be
>>>>>>>>>> difficult to work out exactly what is happening on which GT in a
>>>>>>>>>> multi-GT system. So add GT oriented debug/error message 
>>>>>>>>>> wrappers. If
>>>>>>>>>> used instead of the drm_ equivalents, you get the same output 
>>>>>>>>>> but with
>>>>>>>>>> a GT# prefix on it.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>>>
>>>>>>>>> The only downside to this is that we'll print "GT0: " even on 
>>>>>>>>> single-GT devices. We could introduce a gt->info.name and 
>>>>>>>>> print that, so we could have it different per-platform, but 
>>>>>>>>> IMO it's not worth the effort.
>>>>>>>>>
>>>>>>>>> Reviewed-by: Daniele Ceraolo Spurio 
>>>>>>>>> <daniele.ceraolospurio@intel.com>
>>>>>>>>>
>>>>>>>>> I think it might be worth getting an ack from one of the 
>>>>>>>>> maintainers to make sure we're all aligned on transitioning to 
>>>>>>>>> these new logging macro for gt code.
>>>>>>>>
>>>>>>>> Idea is I think a very good one. First I would suggest 
>>>>>>>> standardising to lowercase GT in logs because:
>>>>>>>>
>>>>>>>> $ grep "GT%" i915/ -r
>>>>>>>> $ grep "gt%" i915/ -r
>>>>>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", 
>>>>>>>> gt->info.id))
>>>>>>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize 
>>>>>>>> gt%d sysfs root\n", gt->info.id);
>>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6 sysfs 
>>>>>>>> files (%pe)\n",
>>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs 
>>>>>>>> files (%pe)\n",
>>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RPS sysfs 
>>>>>>>> files (%pe)",
>>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u 
>>>>>>>> punit_req_freq_mhz sysfs (%pe)",
>>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle 
>>>>>>>> sysfs files (%pe)",
>>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u 
>>>>>>>> media_perf_power_attrs sysfs (%pe)\n",
>>>>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to add gt%u rps defaults 
>>>>>>>> (%pe)\n",
>>>>>>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: 
>>>>>>>> intel_pcode_init failed %d\n", id, ret);
>>>>>>>> i915/i915_hwmon.c: snprintf(ddat_gt->name, 
>>>>>>>> sizeof(ddat_gt->name), "i915_gt%u", i);
>>>>>>>>
>>>>>>
>>>>>> Just because there are 11 existing instances of one form doesn't 
>>>>>> mean that the 275 instances that are waiting to be converted 
>>>>>> should be done incorrectly. GT is an acronym and should be 
>>>>>> capitalised.
>>>>>
>>>>> Okay just make it consistent then.
>>>>>
>>>>>> Besides:
>>>>>> grep -r "GT " i915 | grep '"'
>>>>>> i915/vlv_suspend.c: drm_err(&i915->drm, "timeout disabling GT 
>>>>>> waking\n");
>>>>>> i915/vlv_suspend.c:                     "timeout waiting for GT 
>>>>>> wells to go %s\n",
>>>>>> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access 
>>>>>> while GT waking disabled\n");
>>>>>> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
>>>>>> str_yes_no(gt->awake));
>>>>>> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
>>>>>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on 
>>>>>> %s)", engine->name);
>>>>>> i915/intel_uncore.c:                  "GT thread status wait 
>>>>>> timed out\n");
>>>>>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT 
>>>>>> failed to idle: %d\n", ret);
>>>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>>>>> idle: %d\n", ret);
>>>>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>>>>> idle: %d\n", ret);
>>>>>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as 
>>>>>> "multicast" or "replicated" registers:
>>>>>> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted 
>>>>>> %d C0 cycles [%lldns] in %lldns [%d cycles], using GT clock 
>>>>>> frequency of %uKHz\n",
>>>>>> i915/gt/selftest_hangcheck.c: pr_err("[%s] GT is wedged!\n", 
>>>>>> engine->name);
>>>>>> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
>>>>>> i915/gt/intel_gt_clock_utils.c:                 "GT clock 
>>>>>> frequency changed, was %uHz, now %uHz!\n",
>>>>>> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush 
>>>>>> GT pm before test\n");
>>>>>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n");
>>>>>> i915/i915_sysfs.c:                       "failed to register GT 
>>>>>> sysfs directory\n");
>>>>>> i915/intel_uncore.h:     * of the basic non-engine GT registers 
>>>>>> (referred to as "GSI" on
>>>>>> i915/intel_uncore.h:     * newer platforms, or "GT block" on 
>>>>>> older platforms)?  If so, we'll
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> Then there is a question of naming. Are we okay with GT_XXX or, 
>>>>>>>> do we want intel_gt_, or something completely different. I 
>>>>>>>> don't have a strong opinion at the moment so I'll add some more 
>>>>>>>> folks to Cc.
>>>>>>>
>>>>>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I 
>>>>>> would prefer just gt_err("msg") to keep it as close to the 
>>>>>> official drm_* versions as possible. Print lines tend to be 
>>>>>> excessively long already. Taking a 'gt' parameter instead of a 
>>>>>> '&gt->i915->drm' parameter does help with that but it seems like 
>>>>>> calling the wrapper intel_gt_* is shooting ourselves in the foot 
>>>>>> on that one. And GT_ERR vs gt_err just comes down to the fact 
>>>>>> that it is a macro wrapper and therefore is required to be in 
>>>>>> upper case.
>>>>>>
>>>>>>> There was a maintainer level mini-discussion on this topic which 
>>>>>>> I will try to summarise.
>>>>>>>
>>>>>>> Main contention point was the maintenance cost and generally an 
>>>>>>> undesirable pattern of needing to add many 
>>>>>>> subsystem/component/directory specific macros. Which then 
>>>>>>> typically need extra flavours and so on. But over verbosity of the 
>>>>>> How many versions are you expecting to add? Beyond the tile 
>>>>>> instance, what further addressing requirements are there? The 
>>>>>> card instance is already printed as part of the PCI address. The 
>>>>>> only other reason to add per component wrappers would be to wrap 
>>>>>> the mechanism for getting from some random per component object 
>>>>>> back to the intel_gt structure. But that is hardware a new issue 
>>>>>> being added by this wrapper. It is also not a requirement. Much 
>>>>>> of the code has a gt pointer already. For the parts that don't, 
>>>>>> some of it would be a trivial engine->gt type dereference, some 
>>>>>> of it is a more complex container_of type construction. But for 
>>>>>> those, the given file will already have multiple instances of 
>>>>>> that already (usually as the first or second line of the function 
>>>>>> - 'intel_gt *gt = fancy_access_method(my_obj)' so adding one or 
>>>>>> two more of those as necessary is not making the code harder to 
>>>>>> read.
>>>>>>
>>>>>>> code is obviously also bad, so one compromise idea was to add a 
>>>>>>> macro which builds the GT string and use drm logging helpers 
>>>>>>> directly. This would be something like:
>>>>>>>
>>>>>>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>>>>>>  drm_info(GT_LOG(...same...));
>>>>>> Seriously? As above, some of these lines are already way too 
>>>>>> long, this version makes them even longer with no obvious 
>>>>>> benefit. Worse, it makes it harder to read what is going on. It 
>>>>>> is much less intuitive to read than just replacing the drm_err 
>>>>>> itself. And having two sets of parenthesis with some parameters 
>>>>>> inside the first and some only inside the second is really 
>>>>>> horrid! Also, putting the 'gt' parameter in the middle just 
>>>>>> confuses it with the rest of the printf arguments even though 
>>>>>> there is no %d in the string for it. So now a quick glances tells 
>>>>>> you that your code is wrong because you have three format 
>>>>>> specifiers but four parameters.
>>>>>>
>>>>>> Whereas, just replacing drm_err with gt_err (or GT_ERR or 
>>>>>> intel_gt_err) keeps everything else consistent. The first 
>>>>>> parameter changes from 'drm' to 'gt' but is still the master 
>>>>>> object parameter and it matches the function/macro prefix so 
>>>>>> inherently looks correct. Then you have your message plus 
>>>>>> parameters. No confusing orders, no confusing parenthesis, no 
>>>>>> excessive macro levels, no confusion at all. Just nice simple, 
>>>>>> easy to read, easy to maintain code.
>>>>>
>>>>> I am personally okay with gt_err/GT_ERR some other folks might 
>>>>> object though. And I can also understand the argument why it is 
>>>>> better to not have to define gt_err, gt_warn, gt_info, gt_notice, 
>>>>> gt_debug, gt_err_ratelimited, gt_warn_once.. and instead have only 
>>>>> one macro.
>>>> A small set of trivial macro definitions vs a complicated and 
>>>> unreadable construct on every single print? Erm, isn't that the 
>>>> very definition of abstracting to helpers as generally required by 
>>>> every code review ever?
>>>>
>>>> And what 'other folks might object'? People already CC'd? People 
>>>> outside of i915?
>>>>
>>>>
>>>>>
>>>>> Because of that I was passing on to you the compromise option.
>>>>>
>>>>> It maybe still has net space savings since we wouldn't have to be 
>>>>> repeating the gt->i915->drm whatever and gt->info.id on every line.
>>>>>
>>>>> You are free to try the most compact one and see how hard those 
>>>>> objections will be.
>>>> Um. I already did. This patch. And you are the only person to have 
>>>> objected in any manner at all.
>>>
>>> Where I have objected?
>> Only in everything you have written. Or are you saying that actually I 
>
> ???
>
>> could have just taken the r-b from Daniele and merged it while 
>> completely ignoring everything you have said because you didn't say 
>> 'NACK' at the top of your email?
>
> No, but because we don't do that while there is an ongoing discussion.

Which is exactly my point. If it is not an ack or r-b then by definition 
it is an objection.

>
>>> I was a) asking to convert all gt/ within one kernel release, b) 
>>> transferring the maintainer discussion from IRC to this email chain 
>>> to outlay one alternative, for which I said I could see the pros and 
>>> cons of both, and c) raised the naming question early since that can 
>>> usually become a churn point later on when we have large scale code 
>>> transformations.
>>>
>>> As said, FWIW you have my ack for GT_XXX naming and approach, but 
>>> please do convert the whole of gt/ so we don't ship with a mish-mash 
>>> of log messages.
>>>
>> That sounds like a quite a lot of objections to me - don't do it that 
>> way, do it this way; fine do it that way but expect lots of 
>> complaints when you do; oh all right, do it that way but re-write the 
>> entire driver all at once.
>
> No, I'll repeat what I said one last time.
>
> I said it's good idea, raised the topic of naming, since it is good to 
> get consensus before starting a wide renaming exercise. I also chatted 
> with other maintainers and passed on here one alternative proposed, 
> with its pros and cons. Finally I said your initial approach has my 
> ack at least, even though not everyone liked it. And I asked to 
> convert all gt/, since I think it is a good idea, and frankly it's not 
> a very big ask. It is just how it works.
>
>> I really don't get the 'we must not ship with a mish-mash of log 
>> messages' argument. It's hardly like the current total mish-mash of 
>> DRM_DEBUG, DRM_DEBUG_DRIVER, pr_dbg, drm_dbg, ... is even remotely 
>> consistent. It's also not like the patch is massively changing the 
>> format of the output messages. So some prints get a useful extra 
>> three characters in advance of others. Does anyone care? Would anyone 
>> except us even notice?
>>
>> Is it really a problem to merge this patch now to get the process 
>> started? And other sub-components get updated as and when people get 
>> the time to do them? You could maybe even help rather than posting 
>
> It can be a problem. You opened up a topic which leads to an overall 
> improvement and if we merge churn before we decide on the final 
> direction, then we just get more churn later.
>
>> completely conflicting patch sets that basically duplicate all the 
>> effort for no actual benefit.
>
> Completely conflicting?
>
> Yours:
>
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 25 +++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  9 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 50 ++++------
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  9 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c     | 17 ++--
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 49 +++++-----
>  drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c     |  3 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |  6 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 ++++++------
>  drivers/gpu/drm/i915/gt/uc/intel_huc.c        | 20 ++--
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 84 ++++++++---------
>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 91 +++++++++----------
>  drivers/gpu/drm/i915/gt/uc/selftest_guc.c     | 36 ++++----
>  .../drm/i915/gt/uc/selftest_guc_hangcheck.c   | 22 ++---
>  .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   | 10 +-
>  15 files changed, 228 insertions(+), 259 deletions(-)
>
> Mine:
>
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  2 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 23 ++++++----
>  .../drm/i915/gt/intel_execlists_submission.c  | 13 +++---
>  drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  |  4 +-
>  drivers/gpu/drm/i915/gt/intel_gt.c            |  4 +-
>  drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  8 ++--
>  drivers/gpu/drm/i915/gt/intel_rps.c           |  6 ++-
>  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 43 +++++++++++--------
>  .../gpu/drm/i915/gt/intel_workarounds_types.h |  4 ++
>  .../gpu/drm/i915/gt/selftest_workarounds.c    |  4 +-
>  drivers/gpu/drm/i915/i915_debugfs.c           |  4 +-
>  drivers/gpu/drm/i915/i915_gem.c               |  2 +-
>  drivers/gpu/drm/i915/i915_getparam.c          |  2 +-
>  drivers/gpu/drm/i915/i915_irq.c               | 12 +++---
>  drivers/gpu/drm/i915/i915_perf.c              | 14 +++---
>  drivers/gpu/drm/i915/i915_query.c             | 12 +++---
>  drivers/gpu/drm/i915/i915_sysfs.c             |  3 +-
>  drivers/gpu/drm/i915/i915_vma.c               | 16 ++++---
>  drivers/gpu/drm/i915/intel_uncore.c           | 21 +++++----
>  19 files changed, 116 insertions(+), 81 deletions(-)
>
> I deliberately did not touch gt/uc so they don't touch a single same 
> file!
>
> Unless you are referring to the fact gt logging will need to touch 
> some of the same lines my patch touches. Well yes, some, not all, and 
> I reckoned it will be much quicker since there aren't really any 
> contentious points with DRM_XXX -> drm_xxx. Plus I added struct 
> intel_gt or whatever or it wasn't before so it would make your patch 
> easier/smaller.
Except that it is quite clearly a change that is 'still under 
discussion' and is quite clearly changing exactly the same lines of code 
in a number of places. Therefore, by your definition, your patch set 
cannot be merged because it is causing 'churn'. Alternatively, your set 
can be merged because we don't actually care that a few lines will get 
changed one way and then quickly changed another way in a follow up 
patch and the overall benefit is worth the 'churn'. In which case 
exactly the same rule should apply to this patch set. Merge something 
that adds immediate benefit and if necessary update it to a better 
version later.

Of course, there is also the view that you have just taken the task over 
by posting a more encompassing change that cleans up debug prints in the 
whole of i915. Therefore it is now up to you to update all 
sub-components to use appropriate prints - GT, KMS, GuC, Gem, etc.. 
because "frankly it's not a very big ask. It is just how it works".

John.

>
> Regards,
>
> Tvrtko
Tvrtko Ursulin Nov. 15, 2022, 10:23 a.m. UTC | #16
On 10/11/2022 10:35, Michal Wajdeczko wrote:
> On 10.11.2022 10:55, Tvrtko Ursulin wrote:
>>
>> On 09/11/2022 19:57, Michal Wajdeczko wrote:
>>
>> [snip]
>>
>>>> Is it really a problem to merge this patch now to get the process
>>>> started? And other sub-components get updated as and when people get the
>>>> time to do them? You could maybe even help rather than posting
>>>> completely conflicting patch sets that basically duplicate all the
>>>> effort for no actual benefit.
>>>
>>> Instead of merging this patch now, oriented on GT only, I would rather
>>> wait until we discuss and plan solution for the all sub-components.
>>
>> Yes, agreed.
>>
>>> Once that's done (with agreement on naming and output) we can start
>>> converting exiting messages.
>>>
>>> My proposal would be:
>>>    - use wrappers per component
>>
>> This is passable to me but Jani has raised a concern on IRC that it
>> leads to a lot of macro duplication. Which is I think a valid point, but
>> which does not have a completely nice solution. Best I heard so far was
>> a suggestion from Joonas to add just a single component formatter macro
>> and use the existing drm_xxx helpers.
>>
>>>    - use lower case names
>>
>> I prefer this as well. Even though usual argument is for macros to be
>> upper case, I find the improved readability of lower case trumps that.
>>
>>>    - don't add colon
>>
>> Not sure, when I look at it below it looks a bit not structured enough
>> without the colon, but maybe it is just me.
>>
>>> #define i915_xxx(_i915, _fmt, ...) \
>>>      drm_xxx(&(_i915)->drm, _fmt, ##__VA_ARGS__)
>>>
>>> #define gt_xxx(_gt, _fmt, ...) \
>>>      i915_xxx((_gt)->i915, "GT%u " _fmt, (_gt)->info.id, ..
>>>
>>> #define guc_xxx(_guc, _fmt, ...) \
>>>      gt_xxx(guc_to_gt(_guc), "GuC " _fmt, ..
>>>
>>> #define ct_xxx(_ct, _fmt, ...) \
>>>      guc_xxx(ct_to_guc(_ct), "CTB " _fmt, ..
>>>
>>> where
>>>      xxx = { err, warn, notice, info, dbg }
>>>
>>> and then for calls like:
>>>
>>>      i915_err(i915, "Foo failed (%pe)\n", ERR_PTR(err));
>>>        gt_err(gt,   "Foo failed (%pe)\n", ERR_PTR(err));
>>>       guc_err(guc,  "Foo failed (%pe)\n", ERR_PTR(err));
>>>        ct_err(ct,   "Foo failed (%pe)\n", ERR_PTR(err));

Okay lets go with this then since we have allowed some time for comments 
and no strict objections have been raised. Probably limit to to 
GT/GuC/CT space for not, ie. not adding i915_ loggers.

My preference is just to have a colon in the GT identifier, lowercase or 
uppercase I don't mind.

Regards,

Tvrtko

>>
>> So the macro idea would be like this:
>>
>>    drm_err(I915_LOG("Foo failed (%pe)\n", i915), ERR_PTR(err));
>>    drm_err(GT_LOG("Foo failed (%pe)\n", gt), ERR_PTR(err));
>>    drm_err(GUC_LOG("Foo failed (%pe)\n", guc), ERR_PTR(err));
>>    drm_err(CT_LOG("Foo failed (%pe)\n", ct), ERR_PTR(err));
>>
>> Each component would just need to define a single macro and not have to
>> duplicate all the err, info, warn, notice, ratelimited, once, whatever
>> versions. Which is a benefit but it's a quite a bit uglier to read in
>> the code.
> 
> If there is a choice between having ugly code all over the place and few
> more lines with helpers then without any doubts I would pick the latter.
> 
> And this seems to be option already used elsewhere, see:
> 
> #define dev_err(dev, fmt, ...) \
> 	dev_printk_index_wrap ...
> 
> #define pci_err(pdev, fmt, arg...) \
> 	dev_err(&(pdev)->dev, fmt, ##arg)
> 
> #define drm_err(drm, fmt, ...) \
> 	__drm_printk((drm), err,, "*ERROR* " fmt, ##__VA_ARGS__)
> 
> #define drbd_err(obj, fmt, args...) \
> 	drbd_printk(KERN_ERR, obj, fmt, ## args)
> 
> #define ch7006_err(client, format, ...) \
> 	dev_err(&client->dev, format, __VA_ARGS__)
> 
> #define mthca_err(mdev, format, arg...) \
> 	dev_err(&mdev->pdev->dev, format, ## arg)
> 
> #define ctx_err(ctx, fmt, arg...) \
> 	cal_err((ctx)->cal, "ctx%u: " fmt, (ctx)->dma_ctx, ##arg)
> 
> #define mlx4_err(mdev, format, ...) \
> 	dev_err(&(mdev)->persist->pdev->dev, format, ##__VA_ARGS__)
> 
> ...
> 
> Michal
> 
> 
> [1]
> https://elixir.bootlin.com/linux/v6.1-rc4/source/include/linux/dev_printk.h#L143
> 
> [2]
> https://elixir.bootlin.com/linux/v6.1-rc4/source/include/linux/pci.h#L2485
> 
> [3]
> https://elixir.bootlin.com/linux/v6.1-rc4/source/include/drm/drm_print.h#L468
> 
> [4]
> https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/block/drbd/drbd_int.h#L113
> 
> [5]
> https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/gpu/drm/i2c/ch7006_priv.h#L139
> 
> [6]
> https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/infiniband/hw/mthca/mthca_dev.h#L377
> 
> [7]
> https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/media/platform/ti/cal/cal.h#L279
> 
> [8]
> https://elixir.bootlin.com/linux/v6.1-rc4/source/drivers/net/ethernet/mellanox/mlx4/mlx4.h#L225
> 
>>
>> Perhaps macro could be called something other than XX_LOG to make it
>> more readable, don't know.
>>
>> Regards,
>>
>> Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index e0365d5562484..1e016fb0117a4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -13,6 +13,21 @@ 
 struct drm_i915_private;
 struct drm_printer;
 
+#define GT_ERR(_gt, _fmt, ...) \
+	drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
+#define GT_WARN(_gt, _fmt, ...) \
+	drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
+#define GT_NOTICE(_gt, _fmt, ...) \
+	drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
+#define GT_INFO(_gt, _fmt, ...) \
+	drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
+#define GT_DBG(_gt, _fmt, ...) \
+	drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
 #define GT_TRACE(gt, fmt, ...) do {					\
 	const struct intel_gt *gt__ __maybe_unused = (gt);		\
 	GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev),		\