[4/4] drm/xe/guc: Expose raw access to GuC log over debugfs

Message ID	20240512153606.1996-5-michal.wajdeczko@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDD2446441 for <linux-fsdevel@vger.kernel.org>; Sun, 12 May 2024 15:36:36 +0000 (UTC) From: Michal Wajdeczko <michal.wajdeczko@intel.com> To: intel-xe@lists.freedesktop.org Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>, Lucas De Marchi <lucas.demarchi@intel.com>, John Harrison <John.C.Harrison@Intel.com>, linux-fsdevel@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH 4/4] drm/xe/guc: Expose raw access to GuC log over debugfs Date: Sun, 12 May 2024 17:36:06 +0200 Message-Id: <20240512153606.1996-5-michal.wajdeczko@intel.com> In-Reply-To: <20240512153606.1996-1-michal.wajdeczko@intel.com> References: <20240512153606.1996-1-michal.wajdeczko@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Expose raw access to GuC log over debugfs \| expand [0/4] Expose raw access to GuC log over debugfs [1/4] libfs: add simple_read_from_iomem() [2/4] iosys-map: add iosys_map_read_from() helper [3/4] drm/xe: Add wrapper for iosys_map_read_from [4/4] drm/xe/guc: Expose raw access to GuC log over debugfs

Michal Wajdeczko May 12, 2024, 3:36 p.m. UTC

We already provide the content of the GuC log in debugsfs, but it
is in a text format where each log dword is printed as hexadecimal
number, which does not scale well with large GuC log buffers.

To allow more efficient access to the GuC log, which could benefit
our CI systems, expose raw binary log data.  In addition to less
overhead in preparing text based GuC log file, the new GuC log file
in binary format is also almost 3x smaller.

Any existing script that expects the GuC log buffer in text format
can use command like below to convert from new binary format:

	hexdump -e '4/4 "0x%08x " "\n"'

but this shouldn't be the case as most decoders expect GuC log data
in binary format.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
Cc: linux-fsdevel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

John Harrison May 13, 2024, 4:53 p.m. UTC | #1

On 5/12/2024 08:36, Michal Wajdeczko wrote:
> We already provide the content of the GuC log in debugsfs, but it
> is in a text format where each log dword is printed as hexadecimal
> number, which does not scale well with large GuC log buffers.
>
> To allow more efficient access to the GuC log, which could benefit
> our CI systems, expose raw binary log data.  In addition to less
> overhead in preparing text based GuC log file, the new GuC log file
> in binary format is also almost 3x smaller.
>
> Any existing script that expects the GuC log buffer in text format
> can use command like below to convert from new binary format:
>
> 	hexdump -e '4/4 "0x%08x " "\n"'
>
> but this shouldn't be the case as most decoders expect GuC log data
> in binary format.
I strongly disagree with this.

Efficiency and file size is not an issue when accessing the GuC log via 
debugfs on actual hardware. It is an issue when dumping via dmesg but 
you definitely should not be dumping binary data to dmesg. Whereas, 
dumping in binary data is much more dangerous and liable to corruption 
because some tool along the way tries to convert to ASCII, or truncates 
at the first zero, etc. We request GuC logs be sent by end users, 
customer bug reports, etc. all doing things that we have no control over.

Converting the hexdump back to binary is trivial for those tools which 
require it. If you follow the acquisition and decoding instructions on 
the wiki page then it is all done for you automatically.

These patches are trying to solve a problem which does not exist and are 
going to make working with GuC logs harder and more error prone.

On the other hand, there are many other issues with GuC logs that it 
would be useful to solves - including extra meta data, reliable output 
via dmesg, continuous streaming, pre-sizing the debugfs file to not have 
to generate it ~12 times for a single read, etc.

Hmm. Actually, is this interface allowing the filesystem layers to issue 
multiple read calls to read the buffer out in small chunks? That is also 
going to break things. If the GuC is still writing to the log as the 
user is reading from it, there is the opportunity for each chunk to not 
follow on from the previous chunk because the data has just been 
overwritten. This is already a problem at the moment that causes issues 
when decoding the logs, even with an almost atomic copy of the log into 
a temporary buffer before reading it out. Doing the read in separate 
chunks is only going to make that problem even worse.

John.

> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> ---
> Cc: linux-fsdevel@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> ---
>   drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c b/drivers/gpu/drm/xe/xe_guc_debugfs.c
> index d3822cbea273..53fea952344d 100644
> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
> @@ -8,6 +8,7 @@
>   #include <drm/drm_debugfs.h>
>   #include <drm/drm_managed.h>
>   
> +#include "xe_bo.h"
>   #include "xe_device.h"
>   #include "xe_gt.h"
>   #include "xe_guc.h"
> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
>   	{"guc_log", guc_log, 0},
>   };
>   
> +static ssize_t guc_log_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
> +{
> +	struct dentry *dent = file_dentry(file);
> +	struct dentry *uc_dent = dent->d_parent;
> +	struct dentry *gt_dent = uc_dent->d_parent;
> +	struct xe_gt *gt = gt_dent->d_inode->i_private;
> +	struct xe_guc_log *log = &gt->uc.guc.log;
> +	struct xe_device *xe = gt_to_xe(gt);
> +	ssize_t ret;
> +
> +	xe_pm_runtime_get(xe);
> +	ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap, log->bo->size);
> +	xe_pm_runtime_put(xe);
> +
> +	return ret;
> +}
> +
> +static const struct file_operations guc_log_ops = {
> +	.owner		= THIS_MODULE,
> +	.read		= guc_log_read,
> +	.llseek		= default_llseek,
> +};
> +
>   void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
>   {
>   	struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
>   	drm_debugfs_create_files(local,
>   				 ARRAY_SIZE(debugfs_list),
>   				 parent, minor);
> +
> +	debugfs_create_file("guc_log_raw", 0600, parent, NULL, &guc_log_ops);
>   }

Michal Wajdeczko May 14, 2024, 2:58 p.m. UTC | #2

On 13.05.2024 18:53, John Harrison wrote:
> On 5/12/2024 08:36, Michal Wajdeczko wrote:
>> We already provide the content of the GuC log in debugsfs, but it
>> is in a text format where each log dword is printed as hexadecimal
>> number, which does not scale well with large GuC log buffers.
>>
>> To allow more efficient access to the GuC log, which could benefit
>> our CI systems, expose raw binary log data.  In addition to less
>> overhead in preparing text based GuC log file, the new GuC log file
>> in binary format is also almost 3x smaller.
>>
>> Any existing script that expects the GuC log buffer in text format
>> can use command like below to convert from new binary format:
>>
>>     hexdump -e '4/4 "0x%08x " "\n"'
>>
>> but this shouldn't be the case as most decoders expect GuC log data
>> in binary format.
> I strongly disagree with this.
> 
> Efficiency and file size is not an issue when accessing the GuC log via
> debugfs on actual hardware. 

to some extend it is as CI team used to refuse to collect GuC logs after
each executed test just because of it's size

> It is an issue when dumping via dmesg but
> you definitely should not be dumping binary data to dmesg. Whereas,

not following here - this is debugfs specific, not a dmesg printer

> dumping in binary data is much more dangerous and liable to corruption
> because some tool along the way tries to convert to ASCII, or truncates
> at the first zero, etc. We request GuC logs be sent by end users,
> customer bug reports, etc. all doing things that we have no control over.

hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?

> 
> Converting the hexdump back to binary is trivial for those tools which
> require it. If you follow the acquisition and decoding instructions on
> the wiki page then it is all done for you automatically.

I'm afraid I don't know where this wiki page is, but I do know that hex
conversion dance is not needed for me to get decoded GuC log the way I
used to do

> 
> These patches are trying to solve a problem which does not exist and are
> going to make working with GuC logs harder and more error prone.

it at least solves the problem of currently super inefficient way of
generating the GuC log in text format.

it also opens other opportunities to develop tools that could monitor or
capture GuC log independently on  top of what driver is able to offer
today (on i915 there was guc-log-relay, but it was broken for long time,
not sure what are the plans for Xe)

also still not sure how it can be more error prone.

> 
> On the other hand, there are many other issues with GuC logs that it
> would be useful to solves - including extra meta data, reliable output
> via dmesg, continuous streaming, pre-sizing the debugfs file to not have
> to generate it ~12 times for a single read, etc.

this series actually solves last issue but in a bit different way (we
even don't need to generate full GuC log dump at all if we would like to
capture only part of the log if we know where to look)

for reliable output via dmesg - see my proposal at [1]

[1] https://patchwork.freedesktop.org/series/133613/

> 
> Hmm. Actually, is this interface allowing the filesystem layers to issue
> multiple read calls to read the buffer out in small chunks? That is also
> going to break things. If the GuC is still writing to the log as the
> user is reading from it, there is the opportunity for each chunk to not
> follow on from the previous chunk because the data has just been
> overwritten. This is already a problem at the moment that causes issues
> when decoding the logs, even with an almost atomic copy of the log into
> a temporary buffer before reading it out. Doing the read in separate
> chunks is only going to make that problem even worse.

current solution, that converts data into hex numbers, reads log buffer
in chunks of 128 dwords, how proposed here solution that reads in 4K
chunks could be "even worse" ?

and in case of some smart tool, that would understands the layout of the
GuC log buffer, we can even fully eliminate problem of reading stale
data, so why not to choose a more scalable solution ?

> 
> John.
> 
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: John Harrison <John.C.Harrison@Intel.com>
>> ---
>> Cc: linux-fsdevel@vger.kernel.org
>> Cc: dri-devel@lists.freedesktop.org
>> ---
>>   drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>> b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>> index d3822cbea273..53fea952344d 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>> @@ -8,6 +8,7 @@
>>   #include <drm/drm_debugfs.h>
>>   #include <drm/drm_managed.h>
>>   +#include "xe_bo.h"
>>   #include "xe_device.h"
>>   #include "xe_gt.h"
>>   #include "xe_guc.h"
>> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
>>       {"guc_log", guc_log, 0},
>>   };
>>   +static ssize_t guc_log_read(struct file *file, char __user *buf,
>> size_t count, loff_t *pos)
>> +{
>> +    struct dentry *dent = file_dentry(file);
>> +    struct dentry *uc_dent = dent->d_parent;
>> +    struct dentry *gt_dent = uc_dent->d_parent;
>> +    struct xe_gt *gt = gt_dent->d_inode->i_private;
>> +    struct xe_guc_log *log = &gt->uc.guc.log;
>> +    struct xe_device *xe = gt_to_xe(gt);
>> +    ssize_t ret;
>> +
>> +    xe_pm_runtime_get(xe);
>> +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
>> log->bo->size);
>> +    xe_pm_runtime_put(xe);
>> +
>> +    return ret;
>> +}
>> +
>> +static const struct file_operations guc_log_ops = {
>> +    .owner        = THIS_MODULE,
>> +    .read        = guc_log_read,
>> +    .llseek        = default_llseek,
>> +};
>> +
>>   void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
>>   {
>>       struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
>> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
>> struct dentry *parent)
>>       drm_debugfs_create_files(local,
>>                    ARRAY_SIZE(debugfs_list),
>>                    parent, minor);
>> +
>> +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
>> &guc_log_ops);
>>   }
>

John Harrison May 14, 2024, 6:13 p.m. UTC | #3

On 5/14/2024 07:58, Michal Wajdeczko wrote:
> On 13.05.2024 18:53, John Harrison wrote:
>> On 5/12/2024 08:36, Michal Wajdeczko wrote:
>>> We already provide the content of the GuC log in debugsfs, but it
>>> is in a text format where each log dword is printed as hexadecimal
>>> number, which does not scale well with large GuC log buffers.
>>>
>>> To allow more efficient access to the GuC log, which could benefit
>>> our CI systems, expose raw binary log data.  In addition to less
>>> overhead in preparing text based GuC log file, the new GuC log file
>>> in binary format is also almost 3x smaller.
>>>
>>> Any existing script that expects the GuC log buffer in text format
>>> can use command like below to convert from new binary format:
>>>
>>>      hexdump -e '4/4 "0x%08x " "\n"'
>>>
>>> but this shouldn't be the case as most decoders expect GuC log data
>>> in binary format.
>> I strongly disagree with this.
>>
>> Efficiency and file size is not an issue when accessing the GuC log via
>> debugfs on actual hardware.
> to some extend it is as CI team used to refuse to collect GuC logs after
> each executed test just because of it's size
I've never heard that argument. I've heard many different arguments but 
not one about file size. The default GuC log size is pretty tiny. So 
size really is not an issue.

>
>> It is an issue when dumping via dmesg but
>> you definitely should not be dumping binary data to dmesg. Whereas,
> not following here - this is debugfs specific, not a dmesg printer
Except that it is preferable to have common code for both if at all 
possible.

>
>> dumping in binary data is much more dangerous and liable to corruption
>> because some tool along the way tries to convert to ASCII, or truncates
>> at the first zero, etc. We request GuC logs be sent by end users,
>> customer bug reports, etc. all doing things that we have no control over.
> hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?
Because someone then tries to email it, or attach it or copy it via 
Windows or any number of other ways in which a file can get munged.

>
>> Converting the hexdump back to binary is trivial for those tools which
>> require it. If you follow the acquisition and decoding instructions on
>> the wiki page then it is all done for you automatically.
> I'm afraid I don't know where this wiki page is, but I do know that hex
> conversion dance is not needed for me to get decoded GuC log the way I
> used to do
Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty 
easy to find.

>> These patches are trying to solve a problem which does not exist and are
>> going to make working with GuC logs harder and more error prone.
> it at least solves the problem of currently super inefficient way of
> generating the GuC log in text format.
>
> it also opens other opportunities to develop tools that could monitor or
> capture GuC log independently on  top of what driver is able to offer
> today (on i915 there was guc-log-relay, but it was broken for long time,
> not sure what are the plans for Xe)
>
> also still not sure how it can be more error prone.
As already explained, the plan is move to LFD - an extensible, 
streamable, logging format. Any non-trivial effort that is not helping 
to move to LFD is not worth the effort.

>
>> On the other hand, there are many other issues with GuC logs that it
>> would be useful to solves - including extra meta data, reliable output
>> via dmesg, continuous streaming, pre-sizing the debugfs file to not have
>> to generate it ~12 times for a single read, etc.
> this series actually solves last issue but in a bit different way (we
> even don't need to generate full GuC log dump at all if we would like to
> capture only part of the log if we know where to look)
No, it doesn't solve it. Your comment below suggests it will be read in 
4KB chunks. Which means your 16MB buffer now requires 4096 separate 
reads! And you only doing partial reads of the section you think you 
need is never going to be reliable on live system. Not sure why you 
would want to anyway. It is just making things much more complex. You 
now need an intelligent user land program to read the log out and decode 
at least the header section to know what data section to read. You can't 
just dump the whole thing with 'cat' or 'dd'.

>
> for reliable output via dmesg - see my proposal at [1]
>
> [1] https://patchwork.freedesktop.org/series/133613/

>
>> Hmm. Actually, is this interface allowing the filesystem layers to issue
>> multiple read calls to read the buffer out in small chunks? That is also
>> going to break things. If the GuC is still writing to the log as the
>> user is reading from it, there is the opportunity for each chunk to not
>> follow on from the previous chunk because the data has just been
>> overwritten. This is already a problem at the moment that causes issues
>> when decoding the logs, even with an almost atomic copy of the log into
>> a temporary buffer before reading it out. Doing the read in separate
>> chunks is only going to make that problem even worse.
> current solution, that converts data into hex numbers, reads log buffer
> in chunks of 128 dwords, how proposed here solution that reads in 4K
> chunks could be "even worse" ?
See above, 4KB chunks means 4096 separate reads for a 16M buffer. And 
each one of those reads is a full round trip to user land and back. If 
you want to get at all close to an atomic read of the log then it needs 
to be done as a single call that copies the log into a locally allocated 
kernel buffer and then allows user land to read out from that buffer 
rather than from the live log. Which can be trivially done with the 
current method (at the expense of a large memory allocation) but would 
be much more difficult with random access reader like this as you would 
need to say the copied buffer around until the reads have all been done. 
Which would presumably mean adding open/close handlers to allocate and 
free that memory.

>
> and in case of some smart tool, that would understands the layout of the
> GuC log buffer, we can even fully eliminate problem of reading stale
> data, so why not to choose a more scalable solution ?
You cannot eliminate the problem of stale data. You read the header, you 
read the data it was pointing to, you re-read the header and find that 
the GuC has moved on. That is an infinite loop of continuously updating 
pointers.

John.

>
>> John.
>>
>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>> ---
>>> Cc: linux-fsdevel@vger.kernel.org
>>> Cc: dri-devel@lists.freedesktop.org
>>> ---
>>>    drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
>>>    1 file changed, 26 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>> b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>> index d3822cbea273..53fea952344d 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>> @@ -8,6 +8,7 @@
>>>    #include <drm/drm_debugfs.h>
>>>    #include <drm/drm_managed.h>
>>>    +#include "xe_bo.h"
>>>    #include "xe_device.h"
>>>    #include "xe_gt.h"
>>>    #include "xe_guc.h"
>>> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
>>>        {"guc_log", guc_log, 0},
>>>    };
>>>    +static ssize_t guc_log_read(struct file *file, char __user *buf,
>>> size_t count, loff_t *pos)
>>> +{
>>> +    struct dentry *dent = file_dentry(file);
>>> +    struct dentry *uc_dent = dent->d_parent;
>>> +    struct dentry *gt_dent = uc_dent->d_parent;
>>> +    struct xe_gt *gt = gt_dent->d_inode->i_private;
>>> +    struct xe_guc_log *log = &gt->uc.guc.log;
>>> +    struct xe_device *xe = gt_to_xe(gt);
>>> +    ssize_t ret;
>>> +
>>> +    xe_pm_runtime_get(xe);
>>> +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
>>> log->bo->size);
>>> +    xe_pm_runtime_put(xe);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static const struct file_operations guc_log_ops = {
>>> +    .owner        = THIS_MODULE,
>>> +    .read        = guc_log_read,
>>> +    .llseek        = default_llseek,
>>> +};
>>> +
>>>    void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
>>>    {
>>>        struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
>>> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
>>> struct dentry *parent)
>>>        drm_debugfs_create_files(local,
>>>                     ARRAY_SIZE(debugfs_list),
>>>                     parent, minor);
>>> +
>>> +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
>>> &guc_log_ops);
>>>    }

Matthew Brost May 14, 2024, 8:31 p.m. UTC | #4

On Tue, May 14, 2024 at 11:13:14AM -0700, John Harrison wrote:
> On 5/14/2024 07:58, Michal Wajdeczko wrote:
> > On 13.05.2024 18:53, John Harrison wrote:
> > > On 5/12/2024 08:36, Michal Wajdeczko wrote:
> > > > We already provide the content of the GuC log in debugsfs, but it
> > > > is in a text format where each log dword is printed as hexadecimal
> > > > number, which does not scale well with large GuC log buffers.
> > > > 
> > > > To allow more efficient access to the GuC log, which could benefit
> > > > our CI systems, expose raw binary log data.  In addition to less
> > > > overhead in preparing text based GuC log file, the new GuC log file
> > > > in binary format is also almost 3x smaller.
> > > > 
> > > > Any existing script that expects the GuC log buffer in text format
> > > > can use command like below to convert from new binary format:
> > > > 
> > > >      hexdump -e '4/4 "0x%08x " "\n"'
> > > > 
> > > > but this shouldn't be the case as most decoders expect GuC log data
> > > > in binary format.
> > > I strongly disagree with this.
> > > 
> > > Efficiency and file size is not an issue when accessing the GuC log via
> > > debugfs on actual hardware.
> > to some extend it is as CI team used to refuse to collect GuC logs after
> > each executed test just because of it's size
> I've never heard that argument. I've heard many different arguments but not
> one about file size. The default GuC log size is pretty tiny. So size really
> is not an issue.
> 
> > 
> > > It is an issue when dumping via dmesg but
> > > you definitely should not be dumping binary data to dmesg. Whereas,
> > not following here - this is debugfs specific, not a dmesg printer
> Except that it is preferable to have common code for both if at all
> possible.
> 
> > 
> > > dumping in binary data is much more dangerous and liable to corruption
> > > because some tool along the way tries to convert to ASCII, or truncates
> > > at the first zero, etc. We request GuC logs be sent by end users,
> > > customer bug reports, etc. all doing things that we have no control over.
> > hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?
> Because someone then tries to email it, or attach it or copy it via Windows
> or any number of other ways in which a file can get munged.
> 
> > 
> > > Converting the hexdump back to binary is trivial for those tools which
> > > require it. If you follow the acquisition and decoding instructions on
> > > the wiki page then it is all done for you automatically.
> > I'm afraid I don't know where this wiki page is, but I do know that hex
> > conversion dance is not needed for me to get decoded GuC log the way I
> > used to do
> Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty easy
> to find.
> 
> > > These patches are trying to solve a problem which does not exist and are
> > > going to make working with GuC logs harder and more error prone.
> > it at least solves the problem of currently super inefficient way of
> > generating the GuC log in text format.
> > 
> > it also opens other opportunities to develop tools that could monitor or
> > capture GuC log independently on  top of what driver is able to offer
> > today (on i915 there was guc-log-relay, but it was broken for long time,
> > not sure what are the plans for Xe)
> > 
> > also still not sure how it can be more error prone.
> As already explained, the plan is move to LFD - an extensible, streamable,
> logging format. Any non-trivial effort that is not helping to move to LFD is
> not worth the effort.
> 
> > 
> > > On the other hand, there are many other issues with GuC logs that it
> > > would be useful to solves - including extra meta data, reliable output
> > > via dmesg, continuous streaming, pre-sizing the debugfs file to not have
> > > to generate it ~12 times for a single read, etc.
> > this series actually solves last issue but in a bit different way (we
> > even don't need to generate full GuC log dump at all if we would like to
> > capture only part of the log if we know where to look)
> No, it doesn't solve it. Your comment below suggests it will be read in 4KB
> chunks. Which means your 16MB buffer now requires 4096 separate reads! And
> you only doing partial reads of the section you think you need is never
> going to be reliable on live system. Not sure why you would want to anyway.
> It is just making things much more complex. You now need an intelligent user
> land program to read the log out and decode at least the header section to
> know what data section to read. You can't just dump the whole thing with
> 'cat' or 'dd'.
> 

Briefly have read this thread but seconding John's opinion that anything
which breaks GuC log collection via a simple command like cat is a no
go. Also anything that can allow windows to mangle the output would be
less than idle too. Lastly, GuC log collection is not a critical path
operation so it likely does not need to optimized for speed.

Matt

> > 
> > for reliable output via dmesg - see my proposal at [1]
> > 
> > [1] https://patchwork.freedesktop.org/series/133613/
> 
> > 
> > > Hmm. Actually, is this interface allowing the filesystem layers to issue
> > > multiple read calls to read the buffer out in small chunks? That is also
> > > going to break things. If the GuC is still writing to the log as the
> > > user is reading from it, there is the opportunity for each chunk to not
> > > follow on from the previous chunk because the data has just been
> > > overwritten. This is already a problem at the moment that causes issues
> > > when decoding the logs, even with an almost atomic copy of the log into
> > > a temporary buffer before reading it out. Doing the read in separate
> > > chunks is only going to make that problem even worse.
> > current solution, that converts data into hex numbers, reads log buffer
> > in chunks of 128 dwords, how proposed here solution that reads in 4K
> > chunks could be "even worse" ?
> See above, 4KB chunks means 4096 separate reads for a 16M buffer. And each
> one of those reads is a full round trip to user land and back. If you want
> to get at all close to an atomic read of the log then it needs to be done as
> a single call that copies the log into a locally allocated kernel buffer and
> then allows user land to read out from that buffer rather than from the live
> log. Which can be trivially done with the current method (at the expense of
> a large memory allocation) but would be much more difficult with random
> access reader like this as you would need to say the copied buffer around
> until the reads have all been done. Which would presumably mean adding
> open/close handlers to allocate and free that memory.
> 
> > 
> > and in case of some smart tool, that would understands the layout of the
> > GuC log buffer, we can even fully eliminate problem of reading stale
> > data, so why not to choose a more scalable solution ?
> You cannot eliminate the problem of stale data. You read the header, you
> read the data it was pointing to, you re-read the header and find that the
> GuC has moved on. That is an infinite loop of continuously updating
> pointers.
> 
> John.
> 
> > 
> > > John.
> > > 
> > > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > > Cc: John Harrison <John.C.Harrison@Intel.com>
> > > > ---
> > > > Cc: linux-fsdevel@vger.kernel.org
> > > > Cc: dri-devel@lists.freedesktop.org
> > > > ---
> > > >    drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
> > > >    1 file changed, 26 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
> > > > b/drivers/gpu/drm/xe/xe_guc_debugfs.c
> > > > index d3822cbea273..53fea952344d 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
> > > > @@ -8,6 +8,7 @@
> > > >    #include <drm/drm_debugfs.h>
> > > >    #include <drm/drm_managed.h>
> > > >    +#include "xe_bo.h"
> > > >    #include "xe_device.h"
> > > >    #include "xe_gt.h"
> > > >    #include "xe_guc.h"
> > > > @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
> > > >        {"guc_log", guc_log, 0},
> > > >    };
> > > >    +static ssize_t guc_log_read(struct file *file, char __user *buf,
> > > > size_t count, loff_t *pos)
> > > > +{
> > > > +    struct dentry *dent = file_dentry(file);
> > > > +    struct dentry *uc_dent = dent->d_parent;
> > > > +    struct dentry *gt_dent = uc_dent->d_parent;
> > > > +    struct xe_gt *gt = gt_dent->d_inode->i_private;
> > > > +    struct xe_guc_log *log = &gt->uc.guc.log;
> > > > +    struct xe_device *xe = gt_to_xe(gt);
> > > > +    ssize_t ret;
> > > > +
> > > > +    xe_pm_runtime_get(xe);
> > > > +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
> > > > log->bo->size);
> > > > +    xe_pm_runtime_put(xe);
> > > > +
> > > > +    return ret;
> > > > +}
> > > > +
> > > > +static const struct file_operations guc_log_ops = {
> > > > +    .owner        = THIS_MODULE,
> > > > +    .read        = guc_log_read,
> > > > +    .llseek        = default_llseek,
> > > > +};
> > > > +
> > > >    void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
> > > >    {
> > > >        struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
> > > > @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
> > > > struct dentry *parent)
> > > >        drm_debugfs_create_files(local,
> > > >                     ARRAY_SIZE(debugfs_list),
> > > >                     parent, minor);
> > > > +
> > > > +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
> > > > &guc_log_ops);
> > > >    }
>

Michal Wajdeczko May 14, 2024, 8:41 p.m. UTC | #5

On 14.05.2024 20:13, John Harrison wrote:
> On 5/14/2024 07:58, Michal Wajdeczko wrote:
>> On 13.05.2024 18:53, John Harrison wrote:
>>> On 5/12/2024 08:36, Michal Wajdeczko wrote:
>>>> We already provide the content of the GuC log in debugsfs, but it
>>>> is in a text format where each log dword is printed as hexadecimal
>>>> number, which does not scale well with large GuC log buffers.
>>>>
>>>> To allow more efficient access to the GuC log, which could benefit
>>>> our CI systems, expose raw binary log data.  In addition to less
>>>> overhead in preparing text based GuC log file, the new GuC log file
>>>> in binary format is also almost 3x smaller.
>>>>
>>>> Any existing script that expects the GuC log buffer in text format
>>>> can use command like below to convert from new binary format:
>>>>
>>>>      hexdump -e '4/4 "0x%08x " "\n"'
>>>>
>>>> but this shouldn't be the case as most decoders expect GuC log data
>>>> in binary format.
>>> I strongly disagree with this.
>>>
>>> Efficiency and file size is not an issue when accessing the GuC log via
>>> debugfs on actual hardware.
>> to some extend it is as CI team used to refuse to collect GuC logs after
>> each executed test just because of it's size
> I've never heard that argument. I've heard many different arguments but
> not one about file size. The default GuC log size is pretty tiny. So
> size really is not an issue.

so it's tiny or 16MB as you mention below ?

> 
>>
>>> It is an issue when dumping via dmesg but
>>> you definitely should not be dumping binary data to dmesg. Whereas,
>> not following here - this is debugfs specific, not a dmesg printer
> Except that it is preferable to have common code for both if at all
> possible.

but here, for debugfs, it's almost no code, it's 1-liner thanks to using
generic helpers, so there is really nothing to share as common code

note that with this separate raw access to guc log over debugfs, you can
further customize xe_guc_log_dump() function for dmesg output [2]
without worrying about impact in generating output to debugfs

[2] https://patchwork.freedesktop.org/series/133349/

> 
>>
>>> dumping in binary data is much more dangerous and liable to corruption
>>> because some tool along the way tries to convert to ASCII, or truncates
>>> at the first zero, etc. We request GuC logs be sent by end users,
>>> customer bug reports, etc. all doing things that we have no control
>>> over.
>> hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?
> Because someone then tries to email it, or attach it or copy it via
> Windows or any number of other ways in which a file can get munged.

no comment

> 
>>
>>> Converting the hexdump back to binary is trivial for those tools which
>>> require it. If you follow the acquisition and decoding instructions on
>>> the wiki page then it is all done for you automatically.
>> I'm afraid I don't know where this wiki page is, but I do know that hex
>> conversion dance is not needed for me to get decoded GuC log the way I
>> used to do
> Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty
> easy to find.

ok, found it

btw, it says "Actual log size will be significantly more (about 50MB) as
there are multiple sections."

> 
>>> These patches are trying to solve a problem which does not exist and are
>>> going to make working with GuC logs harder and more error prone.
>> it at least solves the problem of currently super inefficient way of
>> generating the GuC log in text format.
>>
>> it also opens other opportunities to develop tools that could monitor or
>> capture GuC log independently on  top of what driver is able to offer
>> today (on i915 there was guc-log-relay, but it was broken for long time,
>> not sure what are the plans for Xe)
>>
>> also still not sure how it can be more error prone.
> As already explained, the plan is move to LFD - an extensible,
> streamable, logging format. Any non-trivial effort that is not helping
> to move to LFD is not worth the effort.

which part from my series was non-trivial ?

> 
>>
>>> On the other hand, there are many other issues with GuC logs that it
>>> would be useful to solves - including extra meta data, reliable output
>>> via dmesg, continuous streaming, pre-sizing the debugfs file to not have
>>> to generate it ~12 times for a single read, etc.
>> this series actually solves last issue but in a bit different way (we
>> even don't need to generate full GuC log dump at all if we would like to
>> capture only part of the log if we know where to look)
> No, it doesn't solve it. Your comment below suggests it will be read in
> 4KB chunks. 

chunks will be 4K if we stick to proposed here simple_read_from_iomem()
that initially uses hardcoded 4K chunks, but we could either modify it
to use larger chunks by default or extend it to take additional param,
or promote more powerful copy_to_user_fromio() from SOUND

> Which means your 16MB buffer now requires 4096 separate
> reads! And you only doing partial reads of the section you think you
> need is never going to be reliable on live system. Not sure why you
> would want to anyway. It is just making things much more complex. You
> now need an intelligent user land program to read the log out and decode

I don't need it. We can add it later. And we can add it on top what we
already expose without the need to recompile/rebuild the driver.

> at least the header section to know what data section to read. You can't
> just dump the whole thing with 'cat' or 'dd'.

only 'cat' wont work as it's binary file

> 
>>
>> for reliable output via dmesg - see my proposal at [1]
>>
>> [1] https://patchwork.freedesktop.org/series/133613/
> 
>>
>>> Hmm. Actually, is this interface allowing the filesystem layers to issue
>>> multiple read calls to read the buffer out in small chunks? That is also
>>> going to break things. If the GuC is still writing to the log as the
>>> user is reading from it, there is the opportunity for each chunk to not
>>> follow on from the previous chunk because the data has just been
>>> overwritten. This is already a problem at the moment that causes issues
>>> when decoding the logs, even with an almost atomic copy of the log into
>>> a temporary buffer before reading it out. Doing the read in separate
>>> chunks is only going to make that problem even worse.
>> current solution, that converts data into hex numbers, reads log buffer
>> in chunks of 128 dwords, how proposed here solution that reads in 4K
>> chunks could be "even worse" ?
> See above, 4KB chunks means 4096 separate reads for a 16M buffer. And
> each one of those reads is a full round trip to user land and back. If

but is this a proven problem for us?

> you want to get at all close to an atomic read of the log then it needs
> to be done as a single call that copies the log into a locally allocated
> kernel buffer and then allows user land to read out from that buffer
> rather than from the live log. Which can be trivially done with the
> current method (at the expense of a large memory allocation) but would
> be much more difficult with random access reader like this as you would
> need to say the copied buffer around until the reads have all been done.
> Which would presumably mean adding open/close handlers to allocate and
> free that memory.

as I mentioned above if we desperately need larger copies then we can
use the code promoted from the SOUND subsystem

but for random access reader (up to 4K) this is what this patch already
provides.

> 
>>
>> and in case of some smart tool, that would understands the layout of the
>> GuC log buffer, we can even fully eliminate problem of reading stale
>> data, so why not to choose a more scalable solution ?
> You cannot eliminate the problem of stale data. You read the header, you
> read the data it was pointing to, you re-read the header and find that
> the GuC has moved on. That is an infinite loop of continuously updating
> pointers.

I didn't say that I can create snapshot that is 100% free of stale data,
what I meant was that with this proposal I can provide almost real time
access to the GuC log, so with custom tool I can read pointers and and
log entries as small randomly located chunks in the buffer, without the
need to output whole log buffer snapshot as giant text file that I would
have to parse again.

> 
> John.
> 
>>
>>> John.
>>>
>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>>> ---
>>>> Cc: linux-fsdevel@vger.kernel.org
>>>> Cc: dri-devel@lists.freedesktop.org
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
>>>>    1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>> b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>> index d3822cbea273..53fea952344d 100644
>>>> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>> @@ -8,6 +8,7 @@
>>>>    #include <drm/drm_debugfs.h>
>>>>    #include <drm/drm_managed.h>
>>>>    +#include "xe_bo.h"
>>>>    #include "xe_device.h"
>>>>    #include "xe_gt.h"
>>>>    #include "xe_guc.h"
>>>> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
>>>>        {"guc_log", guc_log, 0},
>>>>    };
>>>>    +static ssize_t guc_log_read(struct file *file, char __user *buf,
>>>> size_t count, loff_t *pos)
>>>> +{
>>>> +    struct dentry *dent = file_dentry(file);
>>>> +    struct dentry *uc_dent = dent->d_parent;
>>>> +    struct dentry *gt_dent = uc_dent->d_parent;
>>>> +    struct xe_gt *gt = gt_dent->d_inode->i_private;
>>>> +    struct xe_guc_log *log = &gt->uc.guc.log;
>>>> +    struct xe_device *xe = gt_to_xe(gt);
>>>> +    ssize_t ret;
>>>> +
>>>> +    xe_pm_runtime_get(xe);
>>>> +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
>>>> log->bo->size);
>>>> +    xe_pm_runtime_put(xe);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static const struct file_operations guc_log_ops = {
>>>> +    .owner        = THIS_MODULE,
>>>> +    .read        = guc_log_read,
>>>> +    .llseek        = default_llseek,
>>>> +};
>>>> +
>>>>    void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry
>>>> *parent)
>>>>    {
>>>>        struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
>>>> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
>>>> struct dentry *parent)
>>>>        drm_debugfs_create_files(local,
>>>>                     ARRAY_SIZE(debugfs_list),
>>>>                     parent, minor);
>>>> +
>>>> +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
>>>> &guc_log_ops);
>>>>    }
>

Michal Wajdeczko May 14, 2024, 9:15 p.m. UTC | #6

On 14.05.2024 22:31, Matthew Brost wrote:
> On Tue, May 14, 2024 at 11:13:14AM -0700, John Harrison wrote:
>> On 5/14/2024 07:58, Michal Wajdeczko wrote:
>>> On 13.05.2024 18:53, John Harrison wrote:
>>>> On 5/12/2024 08:36, Michal Wajdeczko wrote:
>>>>> We already provide the content of the GuC log in debugsfs, but it
>>>>> is in a text format where each log dword is printed as hexadecimal
>>>>> number, which does not scale well with large GuC log buffers.
>>>>>
>>>>> To allow more efficient access to the GuC log, which could benefit
>>>>> our CI systems, expose raw binary log data.  In addition to less
>>>>> overhead in preparing text based GuC log file, the new GuC log file
>>>>> in binary format is also almost 3x smaller.
>>>>>
>>>>> Any existing script that expects the GuC log buffer in text format
>>>>> can use command like below to convert from new binary format:
>>>>>
>>>>>      hexdump -e '4/4 "0x%08x " "\n"'
>>>>>
>>>>> but this shouldn't be the case as most decoders expect GuC log data
>>>>> in binary format.
>>>> I strongly disagree with this.
>>>>
>>>> Efficiency and file size is not an issue when accessing the GuC log via
>>>> debugfs on actual hardware.
>>> to some extend it is as CI team used to refuse to collect GuC logs after
>>> each executed test just because of it's size
>> I've never heard that argument. I've heard many different arguments but not
>> one about file size. The default GuC log size is pretty tiny. So size really
>> is not an issue.
>>
>>>
>>>> It is an issue when dumping via dmesg but
>>>> you definitely should not be dumping binary data to dmesg. Whereas,
>>> not following here - this is debugfs specific, not a dmesg printer
>> Except that it is preferable to have common code for both if at all
>> possible.
>>
>>>
>>>> dumping in binary data is much more dangerous and liable to corruption
>>>> because some tool along the way tries to convert to ASCII, or truncates
>>>> at the first zero, etc. We request GuC logs be sent by end users,
>>>> customer bug reports, etc. all doing things that we have no control over.
>>> hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?
>> Because someone then tries to email it, or attach it or copy it via Windows
>> or any number of other ways in which a file can get munged.
>>
>>>
>>>> Converting the hexdump back to binary is trivial for those tools which
>>>> require it. If you follow the acquisition and decoding instructions on
>>>> the wiki page then it is all done for you automatically.
>>> I'm afraid I don't know where this wiki page is, but I do know that hex
>>> conversion dance is not needed for me to get decoded GuC log the way I
>>> used to do
>> Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty easy
>> to find.
>>
>>>> These patches are trying to solve a problem which does not exist and are
>>>> going to make working with GuC logs harder and more error prone.
>>> it at least solves the problem of currently super inefficient way of
>>> generating the GuC log in text format.
>>>
>>> it also opens other opportunities to develop tools that could monitor or
>>> capture GuC log independently on  top of what driver is able to offer
>>> today (on i915 there was guc-log-relay, but it was broken for long time,
>>> not sure what are the plans for Xe)
>>>
>>> also still not sure how it can be more error prone.
>> As already explained, the plan is move to LFD - an extensible, streamable,
>> logging format. Any non-trivial effort that is not helping to move to LFD is
>> not worth the effort.
>>
>>>
>>>> On the other hand, there are many other issues with GuC logs that it
>>>> would be useful to solves - including extra meta data, reliable output
>>>> via dmesg, continuous streaming, pre-sizing the debugfs file to not have
>>>> to generate it ~12 times for a single read, etc.
>>> this series actually solves last issue but in a bit different way (we
>>> even don't need to generate full GuC log dump at all if we would like to
>>> capture only part of the log if we know where to look)
>> No, it doesn't solve it. Your comment below suggests it will be read in 4KB
>> chunks. Which means your 16MB buffer now requires 4096 separate reads! And
>> you only doing partial reads of the section you think you need is never
>> going to be reliable on live system. Not sure why you would want to anyway.
>> It is just making things much more complex. You now need an intelligent user
>> land program to read the log out and decode at least the header section to
>> know what data section to read. You can't just dump the whole thing with
>> 'cat' or 'dd'.
>>
> 
> Briefly have read this thread but seconding John's opinion that anything
> which breaks GuC log collection via a simple command like cat is a no

hexdump or cp is also a simple command and likely we can assume that
either user will know what command to use or will just type the command
that we say.

> go. Also anything that can allow windows to mangle the output would be
> less than idle too. Lastly, GuC log collection is not a critical path
> operation so it likely does not need to optimized for speed.

but please remember that this patch does not change anything to the
existing debugfs files, the guc_log stays as is, this new raw access is
defined as another guc_log_raw file that would allow develop other use
cases, beyond what is possible with naive text snapshots, like live
monitor for errors, all implemented above kernel driver

> 
> Matt
> 
>>>
>>> for reliable output via dmesg - see my proposal at [1]
>>>
>>> [1] https://patchwork.freedesktop.org/series/133613/
>>
>>>
>>>> Hmm. Actually, is this interface allowing the filesystem layers to issue
>>>> multiple read calls to read the buffer out in small chunks? That is also
>>>> going to break things. If the GuC is still writing to the log as the
>>>> user is reading from it, there is the opportunity for each chunk to not
>>>> follow on from the previous chunk because the data has just been
>>>> overwritten. This is already a problem at the moment that causes issues
>>>> when decoding the logs, even with an almost atomic copy of the log into
>>>> a temporary buffer before reading it out. Doing the read in separate
>>>> chunks is only going to make that problem even worse.
>>> current solution, that converts data into hex numbers, reads log buffer
>>> in chunks of 128 dwords, how proposed here solution that reads in 4K
>>> chunks could be "even worse" ?
>> See above, 4KB chunks means 4096 separate reads for a 16M buffer. And each
>> one of those reads is a full round trip to user land and back. If you want
>> to get at all close to an atomic read of the log then it needs to be done as
>> a single call that copies the log into a locally allocated kernel buffer and
>> then allows user land to read out from that buffer rather than from the live
>> log. Which can be trivially done with the current method (at the expense of
>> a large memory allocation) but would be much more difficult with random
>> access reader like this as you would need to say the copied buffer around
>> until the reads have all been done. Which would presumably mean adding
>> open/close handlers to allocate and free that memory.
>>
>>>
>>> and in case of some smart tool, that would understands the layout of the
>>> GuC log buffer, we can even fully eliminate problem of reading stale
>>> data, so why not to choose a more scalable solution ?
>> You cannot eliminate the problem of stale data. You read the header, you
>> read the data it was pointing to, you re-read the header and find that the
>> GuC has moved on. That is an infinite loop of continuously updating
>> pointers.
>>
>> John.
>>
>>>
>>>> John.
>>>>
>>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>>>> ---
>>>>> Cc: linux-fsdevel@vger.kernel.org
>>>>> Cc: dri-devel@lists.freedesktop.org
>>>>> ---
>>>>>    drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
>>>>>    1 file changed, 26 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> index d3822cbea273..53fea952344d 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> @@ -8,6 +8,7 @@
>>>>>    #include <drm/drm_debugfs.h>
>>>>>    #include <drm/drm_managed.h>
>>>>>    +#include "xe_bo.h"
>>>>>    #include "xe_device.h"
>>>>>    #include "xe_gt.h"
>>>>>    #include "xe_guc.h"
>>>>> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
>>>>>        {"guc_log", guc_log, 0},
>>>>>    };
>>>>>    +static ssize_t guc_log_read(struct file *file, char __user *buf,
>>>>> size_t count, loff_t *pos)
>>>>> +{
>>>>> +    struct dentry *dent = file_dentry(file);
>>>>> +    struct dentry *uc_dent = dent->d_parent;
>>>>> +    struct dentry *gt_dent = uc_dent->d_parent;
>>>>> +    struct xe_gt *gt = gt_dent->d_inode->i_private;
>>>>> +    struct xe_guc_log *log = &gt->uc.guc.log;
>>>>> +    struct xe_device *xe = gt_to_xe(gt);
>>>>> +    ssize_t ret;
>>>>> +
>>>>> +    xe_pm_runtime_get(xe);
>>>>> +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
>>>>> log->bo->size);
>>>>> +    xe_pm_runtime_put(xe);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static const struct file_operations guc_log_ops = {
>>>>> +    .owner        = THIS_MODULE,
>>>>> +    .read        = guc_log_read,
>>>>> +    .llseek        = default_llseek,
>>>>> +};
>>>>> +
>>>>>    void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
>>>>>    {
>>>>>        struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
>>>>> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
>>>>> struct dentry *parent)
>>>>>        drm_debugfs_create_files(local,
>>>>>                     ARRAY_SIZE(debugfs_list),
>>>>>                     parent, minor);
>>>>> +
>>>>> +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
>>>>> &guc_log_ops);
>>>>>    }
>>

John Harrison May 14, 2024, 9:54 p.m. UTC | #7

On 5/14/2024 14:15, Michal Wajdeczko wrote:
> On 14.05.2024 22:31, Matthew Brost wrote:
>> On Tue, May 14, 2024 at 11:13:14AM -0700, John Harrison wrote:
>>> On 5/14/2024 07:58, Michal Wajdeczko wrote:
>>>> On 13.05.2024 18:53, John Harrison wrote:
>>>>> On 5/12/2024 08:36, Michal Wajdeczko wrote:
>>>>>> We already provide the content of the GuC log in debugsfs, but it
>>>>>> is in a text format where each log dword is printed as hexadecimal
>>>>>> number, which does not scale well with large GuC log buffers.
>>>>>>
>>>>>> To allow more efficient access to the GuC log, which could benefit
>>>>>> our CI systems, expose raw binary log data.  In addition to less
>>>>>> overhead in preparing text based GuC log file, the new GuC log file
>>>>>> in binary format is also almost 3x smaller.
>>>>>>
>>>>>> Any existing script that expects the GuC log buffer in text format
>>>>>> can use command like below to convert from new binary format:
>>>>>>
>>>>>>       hexdump -e '4/4 "0x%08x " "\n"'
>>>>>>
>>>>>> but this shouldn't be the case as most decoders expect GuC log data
>>>>>> in binary format.
>>>>> I strongly disagree with this.
>>>>>
>>>>> Efficiency and file size is not an issue when accessing the GuC log via
>>>>> debugfs on actual hardware.
>>>> to some extend it is as CI team used to refuse to collect GuC logs after
>>>> each executed test just because of it's size
>>> I've never heard that argument. I've heard many different arguments but not
>>> one about file size. The default GuC log size is pretty tiny. So size really
>>> is not an issue.
>>>
>>>>> It is an issue when dumping via dmesg but
>>>>> you definitely should not be dumping binary data to dmesg. Whereas,
>>>> not following here - this is debugfs specific, not a dmesg printer
>>> Except that it is preferable to have common code for both if at all
>>> possible.
>>>
>>>>> dumping in binary data is much more dangerous and liable to corruption
>>>>> because some tool along the way tries to convert to ASCII, or truncates
>>>>> at the first zero, etc. We request GuC logs be sent by end users,
>>>>> customer bug reports, etc. all doing things that we have no control over.
>>>> hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?
>>> Because someone then tries to email it, or attach it or copy it via Windows
>>> or any number of other ways in which a file can get munged.
>>>
>>>>> Converting the hexdump back to binary is trivial for those tools which
>>>>> require it. If you follow the acquisition and decoding instructions on
>>>>> the wiki page then it is all done for you automatically.
>>>> I'm afraid I don't know where this wiki page is, but I do know that hex
>>>> conversion dance is not needed for me to get decoded GuC log the way I
>>>> used to do
>>> Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty easy
>>> to find.
>>>
>>>>> These patches are trying to solve a problem which does not exist and are
>>>>> going to make working with GuC logs harder and more error prone.
>>>> it at least solves the problem of currently super inefficient way of
>>>> generating the GuC log in text format.
>>>>
>>>> it also opens other opportunities to develop tools that could monitor or
>>>> capture GuC log independently on  top of what driver is able to offer
>>>> today (on i915 there was guc-log-relay, but it was broken for long time,
>>>> not sure what are the plans for Xe)
>>>>
>>>> also still not sure how it can be more error prone.
>>> As already explained, the plan is move to LFD - an extensible, streamable,
>>> logging format. Any non-trivial effort that is not helping to move to LFD is
>>> not worth the effort.
>>>
>>>>> On the other hand, there are many other issues with GuC logs that it
>>>>> would be useful to solves - including extra meta data, reliable output
>>>>> via dmesg, continuous streaming, pre-sizing the debugfs file to not have
>>>>> to generate it ~12 times for a single read, etc.
>>>> this series actually solves last issue but in a bit different way (we
>>>> even don't need to generate full GuC log dump at all if we would like to
>>>> capture only part of the log if we know where to look)
>>> No, it doesn't solve it. Your comment below suggests it will be read in 4KB
>>> chunks. Which means your 16MB buffer now requires 4096 separate reads! And
>>> you only doing partial reads of the section you think you need is never
>>> going to be reliable on live system. Not sure why you would want to anyway.
>>> It is just making things much more complex. You now need an intelligent user
>>> land program to read the log out and decode at least the header section to
>>> know what data section to read. You can't just dump the whole thing with
>>> 'cat' or 'dd'.
>>>
>> Briefly have read this thread but seconding John's opinion that anything
>> which breaks GuC log collection via a simple command like cat is a no
> hexdump or cp is also a simple command and likely we can assume that
> either user will know what command to use or will just type the command
> that we say.
>
>> go. Also anything that can allow windows to mangle the output would be
>> less than idle too. Lastly, GuC log collection is not a critical path
>> operation so it likely does not need to optimized for speed.
> but please remember that this patch does not change anything to the
> existing debugfs files, the guc_log stays as is, this new raw access is
> defined as another guc_log_raw file that would allow develop other use
> cases, beyond what is possible with naive text snapshots, like live
> monitor for errors, all implemented above kernel driver
Which is another issue. We now have two interfaces to the same thing. 
When should someone use one or the other? Which interface should we be 
telling the CI people to use? It is unnecessary duplication and a source 
of confusion and therefore a way to cause even more problems.

John.


>
>> Matt
>>
>>>> for reliable output via dmesg - see my proposal at [1]
>>>>
>>>> [1] https://patchwork.freedesktop.org/series/133613/
>>>>> Hmm. Actually, is this interface allowing the filesystem layers to issue
>>>>> multiple read calls to read the buffer out in small chunks? That is also
>>>>> going to break things. If the GuC is still writing to the log as the
>>>>> user is reading from it, there is the opportunity for each chunk to not
>>>>> follow on from the previous chunk because the data has just been
>>>>> overwritten. This is already a problem at the moment that causes issues
>>>>> when decoding the logs, even with an almost atomic copy of the log into
>>>>> a temporary buffer before reading it out. Doing the read in separate
>>>>> chunks is only going to make that problem even worse.
>>>> current solution, that converts data into hex numbers, reads log buffer
>>>> in chunks of 128 dwords, how proposed here solution that reads in 4K
>>>> chunks could be "even worse" ?
>>> See above, 4KB chunks means 4096 separate reads for a 16M buffer. And each
>>> one of those reads is a full round trip to user land and back. If you want
>>> to get at all close to an atomic read of the log then it needs to be done as
>>> a single call that copies the log into a locally allocated kernel buffer and
>>> then allows user land to read out from that buffer rather than from the live
>>> log. Which can be trivially done with the current method (at the expense of
>>> a large memory allocation) but would be much more difficult with random
>>> access reader like this as you would need to say the copied buffer around
>>> until the reads have all been done. Which would presumably mean adding
>>> open/close handlers to allocate and free that memory.
>>>
>>>> and in case of some smart tool, that would understands the layout of the
>>>> GuC log buffer, we can even fully eliminate problem of reading stale
>>>> data, so why not to choose a more scalable solution ?
>>> You cannot eliminate the problem of stale data. You read the header, you
>>> read the data it was pointing to, you re-read the header and find that the
>>> GuC has moved on. That is an infinite loop of continuously updating
>>> pointers.
>>>
>>> John.
>>>
>>>>> John.
>>>>>
>>>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>>>>> ---
>>>>>> Cc: linux-fsdevel@vger.kernel.org
>>>>>> Cc: dri-devel@lists.freedesktop.org
>>>>>> ---
>>>>>>     drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
>>>>>>     1 file changed, 26 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>>> b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>>> index d3822cbea273..53fea952344d 100644
>>>>>> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>>> @@ -8,6 +8,7 @@
>>>>>>     #include <drm/drm_debugfs.h>
>>>>>>     #include <drm/drm_managed.h>
>>>>>>     +#include "xe_bo.h"
>>>>>>     #include "xe_device.h"
>>>>>>     #include "xe_gt.h"
>>>>>>     #include "xe_guc.h"
>>>>>> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
>>>>>>         {"guc_log", guc_log, 0},
>>>>>>     };
>>>>>>     +static ssize_t guc_log_read(struct file *file, char __user *buf,
>>>>>> size_t count, loff_t *pos)
>>>>>> +{
>>>>>> +    struct dentry *dent = file_dentry(file);
>>>>>> +    struct dentry *uc_dent = dent->d_parent;
>>>>>> +    struct dentry *gt_dent = uc_dent->d_parent;
>>>>>> +    struct xe_gt *gt = gt_dent->d_inode->i_private;
>>>>>> +    struct xe_guc_log *log = &gt->uc.guc.log;
>>>>>> +    struct xe_device *xe = gt_to_xe(gt);
>>>>>> +    ssize_t ret;
>>>>>> +
>>>>>> +    xe_pm_runtime_get(xe);
>>>>>> +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
>>>>>> log->bo->size);
>>>>>> +    xe_pm_runtime_put(xe);
>>>>>> +
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static const struct file_operations guc_log_ops = {
>>>>>> +    .owner        = THIS_MODULE,
>>>>>> +    .read        = guc_log_read,
>>>>>> +    .llseek        = default_llseek,
>>>>>> +};
>>>>>> +
>>>>>>     void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
>>>>>>     {
>>>>>>         struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
>>>>>> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
>>>>>> struct dentry *parent)
>>>>>>         drm_debugfs_create_files(local,
>>>>>>                      ARRAY_SIZE(debugfs_list),
>>>>>>                      parent, minor);
>>>>>> +
>>>>>> +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
>>>>>> &guc_log_ops);
>>>>>>     }

John Harrison May 14, 2024, 10:15 p.m. UTC | #8

On 5/14/2024 13:41, Michal Wajdeczko wrote:
> On 14.05.2024 20:13, John Harrison wrote:
>> On 5/14/2024 07:58, Michal Wajdeczko wrote:
>>> On 13.05.2024 18:53, John Harrison wrote:
>>>> On 5/12/2024 08:36, Michal Wajdeczko wrote:
>>>>> We already provide the content of the GuC log in debugsfs, but it
>>>>> is in a text format where each log dword is printed as hexadecimal
>>>>> number, which does not scale well with large GuC log buffers.
>>>>>
>>>>> To allow more efficient access to the GuC log, which could benefit
>>>>> our CI systems, expose raw binary log data.  In addition to less
>>>>> overhead in preparing text based GuC log file, the new GuC log file
>>>>> in binary format is also almost 3x smaller.
>>>>>
>>>>> Any existing script that expects the GuC log buffer in text format
>>>>> can use command like below to convert from new binary format:
>>>>>
>>>>>       hexdump -e '4/4 "0x%08x " "\n"'
>>>>>
>>>>> but this shouldn't be the case as most decoders expect GuC log data
>>>>> in binary format.
>>>> I strongly disagree with this.
>>>>
>>>> Efficiency and file size is not an issue when accessing the GuC log via
>>>> debugfs on actual hardware.
>>> to some extend it is as CI team used to refuse to collect GuC logs after
>>> each executed test just because of it's size
>> I've never heard that argument. I've heard many different arguments but
>> not one about file size. The default GuC log size is pretty tiny. So
>> size really is not an issue.
> so it's tiny or 16MB as you mention below ?
The default size is tiny. The maximum allowed is 16MB. By default, you 
get tiny logs. When a developer is debugging a specific issue and needs 
larger logs, they can bump the size up to 16MB.

>
>>>> It is an issue when dumping via dmesg but
>>>> you definitely should not be dumping binary data to dmesg. Whereas,
>>> not following here - this is debugfs specific, not a dmesg printer
>> Except that it is preferable to have common code for both if at all
>> possible.
> but here, for debugfs, it's almost no code, it's 1-liner thanks to using
> generic helpers, so there is really nothing to share as common code
>
> note that with this separate raw access to guc log over debugfs, you can
> further customize xe_guc_log_dump() function for dmesg output [2]
> without worrying about impact in generating output to debugfs
>
> [2] https://patchwork.freedesktop.org/series/133349/
Or, we could put all this extra effort into doing something with 
tangible benefit. I've probably already spent more time arguing about 
this patch than it took to implement it. Time I would much rather be 
doing something useful with.

And my point was that the dump size is only relevant for dmesg. For 
debugfs, the size simply does not matter. So trying to optimise the 
debugfs dump size but with a downside of making it more difficult to use 
and more susceptible to issues is a bad trade off that we should not be 
making.

>
>>>> dumping in binary data is much more dangerous and liable to corruption
>>>> because some tool along the way tries to convert to ASCII, or truncates
>>>> at the first zero, etc. We request GuC logs be sent by end users,
>>>> customer bug reports, etc. all doing things that we have no control
>>>> over.
>>> hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?
>> Because someone then tries to email it, or attach it or copy it via
>> Windows or any number of other ways in which a file can get munged.
> no comment
>
>>>> Converting the hexdump back to binary is trivial for those tools which
>>>> require it. If you follow the acquisition and decoding instructions on
>>>> the wiki page then it is all done for you automatically.
>>> I'm afraid I don't know where this wiki page is, but I do know that hex
>>> conversion dance is not needed for me to get decoded GuC log the way I
>>> used to do
>> Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty
>> easy to find.
> ok, found it
>
> btw, it says "Actual log size will be significantly more (about 50MB) as
> there are multiple sections."
16MB debug log, 1MB crash dump, 1MB register capture -> 18MB actual data 
size, expands to about 50MB as an ASCII hexdump.

>
>>>> These patches are trying to solve a problem which does not exist and are
>>>> going to make working with GuC logs harder and more error prone.
>>> it at least solves the problem of currently super inefficient way of
>>> generating the GuC log in text format.
>>>
>>> it also opens other opportunities to develop tools that could monitor or
>>> capture GuC log independently on  top of what driver is able to offer
>>> today (on i915 there was guc-log-relay, but it was broken for long time,
>>> not sure what are the plans for Xe)
>>>
>>> also still not sure how it can be more error prone.
>> As already explained, the plan is move to LFD - an extensible,
>> streamable, logging format. Any non-trivial effort that is not helping
>> to move to LFD is not worth the effort.
> which part from my series was non-trivial ?
The doing it properly part.

If you want a functional streaming interface then you will need a lot 
more than a backdoor access to the GuC log memory buffer. You will need 
all the user land code to interpret it, do the streaming, cope with 
wrap-arounds, etc. Effort which would be more usefully put towards 
implementing LFDs because that gives you all of that and much, much, more.

>
>>>> On the other hand, there are many other issues with GuC logs that it
>>>> would be useful to solves - including extra meta data, reliable output
>>>> via dmesg, continuous streaming, pre-sizing the debugfs file to not have
>>>> to generate it ~12 times for a single read, etc.
>>> this series actually solves last issue but in a bit different way (we
>>> even don't need to generate full GuC log dump at all if we would like to
>>> capture only part of the log if we know where to look)
>> No, it doesn't solve it. Your comment below suggests it will be read in
>> 4KB chunks.
> chunks will be 4K if we stick to proposed here simple_read_from_iomem()
> that initially uses hardcoded 4K chunks, but we could either modify it
> to use larger chunks by default or extend it to take additional param,
> or promote more powerful copy_to_user_fromio() from SOUND
Which is yet more effort to still solve the problem in the wrong manner.

If you are trying to implement streaming logs then we should do that via 
LFDs as that is a much simpler debugfs interface and only requires 'cat' 
on the userland side. If you are trying to solve the problem of multiple 
reads of the buffer for a single dump then a) that is not a problem on 
physical hardware and b) it does not actually solve that problem unless 
you take a snapshot on file open and release the snapshot on file close.

>
>> Which means your 16MB buffer now requires 4096 separate
>> reads! And you only doing partial reads of the section you think you
>> need is never going to be reliable on live system. Not sure why you
>> would want to anyway. It is just making things much more complex. You
>> now need an intelligent user land program to read the log out and decode
> I don't need it. We can add it later. And we can add it on top what we
> already expose without the need to recompile/rebuild the driver.
Or we could put that effort into something which will be of significant 
benefit to all users of the interface.

>
>> at least the header section to know what data section to read. You can't
>> just dump the whole thing with 'cat' or 'dd'.
> only 'cat' wont work as it's binary file
Which is still a problem.

>
>>> for reliable output via dmesg - see my proposal at [1]
>>>
>>> [1] https://patchwork.freedesktop.org/series/133613/
>>>> Hmm. Actually, is this interface allowing the filesystem layers to issue
>>>> multiple read calls to read the buffer out in small chunks? That is also
>>>> going to break things. If the GuC is still writing to the log as the
>>>> user is reading from it, there is the opportunity for each chunk to not
>>>> follow on from the previous chunk because the data has just been
>>>> overwritten. This is already a problem at the moment that causes issues
>>>> when decoding the logs, even with an almost atomic copy of the log into
>>>> a temporary buffer before reading it out. Doing the read in separate
>>>> chunks is only going to make that problem even worse.
>>> current solution, that converts data into hex numbers, reads log buffer
>>> in chunks of 128 dwords, how proposed here solution that reads in 4K
>>> chunks could be "even worse" ?
>> See above, 4KB chunks means 4096 separate reads for a 16M buffer. And
>> each one of those reads is a full round trip to user land and back. If
> but is this a proven problem for us?
So what problem are you trying to solve?

>
>> you want to get at all close to an atomic read of the log then it needs
>> to be done as a single call that copies the log into a locally allocated
>> kernel buffer and then allows user land to read out from that buffer
>> rather than from the live log. Which can be trivially done with the
>> current method (at the expense of a large memory allocation) but would
>> be much more difficult with random access reader like this as you would
>> need to say the copied buffer around until the reads have all been done.
>> Which would presumably mean adding open/close handlers to allocate and
>> free that memory.
> as I mentioned above if we desperately need larger copies then we can
> use the code promoted from the SOUND subsystem
Even more effort.

>
> but for random access reader (up to 4K) this is what this patch already
> provides.
But why do you need random access? Streaming? Then implement LFDs.


>
>>> and in case of some smart tool, that would understands the layout of the
>>> GuC log buffer, we can even fully eliminate problem of reading stale
>>> data, so why not to choose a more scalable solution ?
>> You cannot eliminate the problem of stale data. You read the header, you
>> read the data it was pointing to, you re-read the header and find that
>> the GuC has moved on. That is an infinite loop of continuously updating
>> pointers.
> I didn't say that I can create snapshot that is 100% free of stale data,
> what I meant was that with this proposal I can provide almost real time
> access to the GuC log, so with custom tool I can read pointers and and
> log entries as small randomly located chunks in the buffer, without the
> need to output whole log buffer snapshot as giant text file that I would
> have to parse again.
But why do you want to read small chunks?

John.

>
>> John.
>>
>>>> John.
>>>>
>>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>>>> ---
>>>>> Cc: linux-fsdevel@vger.kernel.org
>>>>> Cc: dri-devel@lists.freedesktop.org
>>>>> ---
>>>>>     drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
>>>>>     1 file changed, 26 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> index d3822cbea273..53fea952344d 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
>>>>> @@ -8,6 +8,7 @@
>>>>>     #include <drm/drm_debugfs.h>
>>>>>     #include <drm/drm_managed.h>
>>>>>     +#include "xe_bo.h"
>>>>>     #include "xe_device.h"
>>>>>     #include "xe_gt.h"
>>>>>     #include "xe_guc.h"
>>>>> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
>>>>>         {"guc_log", guc_log, 0},
>>>>>     };
>>>>>     +static ssize_t guc_log_read(struct file *file, char __user *buf,
>>>>> size_t count, loff_t *pos)
>>>>> +{
>>>>> +    struct dentry *dent = file_dentry(file);
>>>>> +    struct dentry *uc_dent = dent->d_parent;
>>>>> +    struct dentry *gt_dent = uc_dent->d_parent;
>>>>> +    struct xe_gt *gt = gt_dent->d_inode->i_private;
>>>>> +    struct xe_guc_log *log = &gt->uc.guc.log;
>>>>> +    struct xe_device *xe = gt_to_xe(gt);
>>>>> +    ssize_t ret;
>>>>> +
>>>>> +    xe_pm_runtime_get(xe);
>>>>> +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
>>>>> log->bo->size);
>>>>> +    xe_pm_runtime_put(xe);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static const struct file_operations guc_log_ops = {
>>>>> +    .owner        = THIS_MODULE,
>>>>> +    .read        = guc_log_read,
>>>>> +    .llseek        = default_llseek,
>>>>> +};
>>>>> +
>>>>>     void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry
>>>>> *parent)
>>>>>     {
>>>>>         struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
>>>>> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
>>>>> struct dentry *parent)
>>>>>         drm_debugfs_create_files(local,
>>>>>                      ARRAY_SIZE(debugfs_list),
>>>>>                      parent, minor);
>>>>> +
>>>>> +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
>>>>> &guc_log_ops);
>>>>>     }

Matthew Brost May 14, 2024, 10:31 p.m. UTC | #9

On Tue, May 14, 2024 at 11:15:55PM +0200, Michal Wajdeczko wrote:
> 
> 
> On 14.05.2024 22:31, Matthew Brost wrote:
> > On Tue, May 14, 2024 at 11:13:14AM -0700, John Harrison wrote:
> >> On 5/14/2024 07:58, Michal Wajdeczko wrote:
> >>> On 13.05.2024 18:53, John Harrison wrote:
> >>>> On 5/12/2024 08:36, Michal Wajdeczko wrote:
> >>>>> We already provide the content of the GuC log in debugsfs, but it
> >>>>> is in a text format where each log dword is printed as hexadecimal
> >>>>> number, which does not scale well with large GuC log buffers.
> >>>>>
> >>>>> To allow more efficient access to the GuC log, which could benefit
> >>>>> our CI systems, expose raw binary log data.  In addition to less
> >>>>> overhead in preparing text based GuC log file, the new GuC log file
> >>>>> in binary format is also almost 3x smaller.
> >>>>>
> >>>>> Any existing script that expects the GuC log buffer in text format
> >>>>> can use command like below to convert from new binary format:
> >>>>>
> >>>>>      hexdump -e '4/4 "0x%08x " "\n"'
> >>>>>
> >>>>> but this shouldn't be the case as most decoders expect GuC log data
> >>>>> in binary format.
> >>>> I strongly disagree with this.
> >>>>
> >>>> Efficiency and file size is not an issue when accessing the GuC log via
> >>>> debugfs on actual hardware.
> >>> to some extend it is as CI team used to refuse to collect GuC logs after
> >>> each executed test just because of it's size
> >> I've never heard that argument. I've heard many different arguments but not
> >> one about file size. The default GuC log size is pretty tiny. So size really
> >> is not an issue.
> >>
> >>>
> >>>> It is an issue when dumping via dmesg but
> >>>> you definitely should not be dumping binary data to dmesg. Whereas,
> >>> not following here - this is debugfs specific, not a dmesg printer
> >> Except that it is preferable to have common code for both if at all
> >> possible.
> >>
> >>>
> >>>> dumping in binary data is much more dangerous and liable to corruption
> >>>> because some tool along the way tries to convert to ASCII, or truncates
> >>>> at the first zero, etc. We request GuC logs be sent by end users,
> >>>> customer bug reports, etc. all doing things that we have no control over.
> >>> hmm, how "cp gt0/uc/guc_log_raw FILE" could end with a corrupted file ?
> >> Because someone then tries to email it, or attach it or copy it via Windows
> >> or any number of other ways in which a file can get munged.
> >>
> >>>
> >>>> Converting the hexdump back to binary is trivial for those tools which
> >>>> require it. If you follow the acquisition and decoding instructions on
> >>>> the wiki page then it is all done for you automatically.
> >>> I'm afraid I don't know where this wiki page is, but I do know that hex
> >>> conversion dance is not needed for me to get decoded GuC log the way I
> >>> used to do
> >> Look for the 'GuC Debug Logs' page on the developer wiki. It's pretty easy
> >> to find.
> >>
> >>>> These patches are trying to solve a problem which does not exist and are
> >>>> going to make working with GuC logs harder and more error prone.
> >>> it at least solves the problem of currently super inefficient way of
> >>> generating the GuC log in text format.
> >>>
> >>> it also opens other opportunities to develop tools that could monitor or
> >>> capture GuC log independently on  top of what driver is able to offer
> >>> today (on i915 there was guc-log-relay, but it was broken for long time,
> >>> not sure what are the plans for Xe)
> >>>
> >>> also still not sure how it can be more error prone.
> >> As already explained, the plan is move to LFD - an extensible, streamable,
> >> logging format. Any non-trivial effort that is not helping to move to LFD is
> >> not worth the effort.
> >>
> >>>
> >>>> On the other hand, there are many other issues with GuC logs that it
> >>>> would be useful to solves - including extra meta data, reliable output
> >>>> via dmesg, continuous streaming, pre-sizing the debugfs file to not have
> >>>> to generate it ~12 times for a single read, etc.
> >>> this series actually solves last issue but in a bit different way (we
> >>> even don't need to generate full GuC log dump at all if we would like to
> >>> capture only part of the log if we know where to look)
> >> No, it doesn't solve it. Your comment below suggests it will be read in 4KB
> >> chunks. Which means your 16MB buffer now requires 4096 separate reads! And
> >> you only doing partial reads of the section you think you need is never
> >> going to be reliable on live system. Not sure why you would want to anyway.
> >> It is just making things much more complex. You now need an intelligent user
> >> land program to read the log out and decode at least the header section to
> >> know what data section to read. You can't just dump the whole thing with
> >> 'cat' or 'dd'.
> >>
> > 
> > Briefly have read this thread but seconding John's opinion that anything
> > which breaks GuC log collection via a simple command like cat is a no
> 
> hexdump or cp is also a simple command and likely we can assume that
> either user will know what command to use or will just type the command
> that we say.
> 
> > go. Also anything that can allow windows to mangle the output would be
> > less than idle too. Lastly, GuC log collection is not a critical path
> > operation so it likely does not need to optimized for speed.
> 
> but please remember that this patch does not change anything to the
> existing debugfs files, the guc_log stays as is, this new raw access is
> defined as another guc_log_raw file that would allow develop other use
> cases, beyond what is possible with naive text snapshots, like live
> monitor for errors, all implemented above kernel driver
> 

Not opposed to this, nor is it a tremendous amount of code to carry in
addition to existing GuC log interfaces. However, adding an additional
interface could create some confusion. If there are indeed interesting
uses for this new interface, it would be helpful to post those as well.

Matt

> > 
> > Matt
> > 
> >>>
> >>> for reliable output via dmesg - see my proposal at [1]
> >>>
> >>> [1] https://patchwork.freedesktop.org/series/133613/
> >>
> >>>
> >>>> Hmm. Actually, is this interface allowing the filesystem layers to issue
> >>>> multiple read calls to read the buffer out in small chunks? That is also
> >>>> going to break things. If the GuC is still writing to the log as the
> >>>> user is reading from it, there is the opportunity for each chunk to not
> >>>> follow on from the previous chunk because the data has just been
> >>>> overwritten. This is already a problem at the moment that causes issues
> >>>> when decoding the logs, even with an almost atomic copy of the log into
> >>>> a temporary buffer before reading it out. Doing the read in separate
> >>>> chunks is only going to make that problem even worse.
> >>> current solution, that converts data into hex numbers, reads log buffer
> >>> in chunks of 128 dwords, how proposed here solution that reads in 4K
> >>> chunks could be "even worse" ?
> >> See above, 4KB chunks means 4096 separate reads for a 16M buffer. And each
> >> one of those reads is a full round trip to user land and back. If you want
> >> to get at all close to an atomic read of the log then it needs to be done as
> >> a single call that copies the log into a locally allocated kernel buffer and
> >> then allows user land to read out from that buffer rather than from the live
> >> log. Which can be trivially done with the current method (at the expense of
> >> a large memory allocation) but would be much more difficult with random
> >> access reader like this as you would need to say the copied buffer around
> >> until the reads have all been done. Which would presumably mean adding
> >> open/close handlers to allocate and free that memory.
> >>
> >>>
> >>> and in case of some smart tool, that would understands the layout of the
> >>> GuC log buffer, we can even fully eliminate problem of reading stale
> >>> data, so why not to choose a more scalable solution ?
> >> You cannot eliminate the problem of stale data. You read the header, you
> >> read the data it was pointing to, you re-read the header and find that the
> >> GuC has moved on. That is an infinite loop of continuously updating
> >> pointers.
> >>
> >> John.
> >>
> >>>
> >>>> John.
> >>>>
> >>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> >>>>> Cc: John Harrison <John.C.Harrison@Intel.com>
> >>>>> ---
> >>>>> Cc: linux-fsdevel@vger.kernel.org
> >>>>> Cc: dri-devel@lists.freedesktop.org
> >>>>> ---
> >>>>>    drivers/gpu/drm/xe/xe_guc_debugfs.c | 26 ++++++++++++++++++++++++++
> >>>>>    1 file changed, 26 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_debugfs.c
> >>>>> b/drivers/gpu/drm/xe/xe_guc_debugfs.c
> >>>>> index d3822cbea273..53fea952344d 100644
> >>>>> --- a/drivers/gpu/drm/xe/xe_guc_debugfs.c
> >>>>> +++ b/drivers/gpu/drm/xe/xe_guc_debugfs.c
> >>>>> @@ -8,6 +8,7 @@
> >>>>>    #include <drm/drm_debugfs.h>
> >>>>>    #include <drm/drm_managed.h>
> >>>>>    +#include "xe_bo.h"
> >>>>>    #include "xe_device.h"
> >>>>>    #include "xe_gt.h"
> >>>>>    #include "xe_guc.h"
> >>>>> @@ -52,6 +53,29 @@ static const struct drm_info_list debugfs_list[] = {
> >>>>>        {"guc_log", guc_log, 0},
> >>>>>    };
> >>>>>    +static ssize_t guc_log_read(struct file *file, char __user *buf,
> >>>>> size_t count, loff_t *pos)
> >>>>> +{
> >>>>> +    struct dentry *dent = file_dentry(file);
> >>>>> +    struct dentry *uc_dent = dent->d_parent;
> >>>>> +    struct dentry *gt_dent = uc_dent->d_parent;
> >>>>> +    struct xe_gt *gt = gt_dent->d_inode->i_private;
> >>>>> +    struct xe_guc_log *log = &gt->uc.guc.log;
> >>>>> +    struct xe_device *xe = gt_to_xe(gt);
> >>>>> +    ssize_t ret;
> >>>>> +
> >>>>> +    xe_pm_runtime_get(xe);
> >>>>> +    ret = xe_map_read_from(xe, buf, count, pos, &log->bo->vmap,
> >>>>> log->bo->size);
> >>>>> +    xe_pm_runtime_put(xe);
> >>>>> +
> >>>>> +    return ret;
> >>>>> +}
> >>>>> +
> >>>>> +static const struct file_operations guc_log_ops = {
> >>>>> +    .owner        = THIS_MODULE,
> >>>>> +    .read        = guc_log_read,
> >>>>> +    .llseek        = default_llseek,
> >>>>> +};
> >>>>> +
> >>>>>    void xe_guc_debugfs_register(struct xe_guc *guc, struct dentry *parent)
> >>>>>    {
> >>>>>        struct drm_minor *minor = guc_to_xe(guc)->drm.primary;
> >>>>> @@ -72,4 +96,6 @@ void xe_guc_debugfs_register(struct xe_guc *guc,
> >>>>> struct dentry *parent)
> >>>>>        drm_debugfs_create_files(local,
> >>>>>                     ARRAY_SIZE(debugfs_list),
> >>>>>                     parent, minor);
> >>>>> +
> >>>>> +    debugfs_create_file("guc_log_raw", 0600, parent, NULL,
> >>>>> &guc_log_ops);
> >>>>>    }
> >>

[4/4] drm/xe/guc: Expose raw access to GuC log over debugfs

Commit Message

Comments

Patch