Message ID | 20250102174317.1594-1-shiju.jose@huawei.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | [1/1] tracing: Support reading trace event format file larger than PAGE_SIZE | expand |
On Thu, 2 Jan 2025 17:43:17 +0000 <shiju.jose@huawei.com> wrote: > From: Shiju Jose <shiju.jose@huawei.com> > > When userspace reads a trace event format file, the maximum data size > that can be read is limited to PAGE_SIZE by the seq_read() and > seq_read_iter() functions. This results in userspace receiving partial > data if the format file is larger than PAGE_SIZE, requiring a workaround > to read the complete data from the format file. > > Add support for reading trace event format files larger than PAGE_SIZE when > needed by userspace. > > Signed-off-by: Shiju Jose <shiju.jose@huawei.com> How is this an issue? This is common for all pseudo files and can be handled properly from user space. Here, with this program: read.c: -------------------------8<------------------------- #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> int main (int argc, char **argv) { char *buf; int fd; off_t size; int r, s; if (argc < 2) { printf("usage: %s file\n", argv[0]); exit(-1); } fd = open(argv[1], O_RDONLY); if (fd < 0) { perror(argv[1]); exit(-1); } size = BUFSIZ * 10; buf = malloc(size); for (s = 0, r = 1; r > 0; s += r) { r = read(fd, buf, size); if (r < 0) { perror(argv[1]); exit(-1); } printf("Read %d bytes from %s\n", r, argv[1]); } free(buf); close(fd); return 0; } ------------------------->8------------------------- $ read /proc/kallsyms Read 4091 bytes from /proc/kallsyms Read 4075 bytes from /proc/kallsyms Read 4078 bytes from /proc/kallsyms Read 4083 bytes from /proc/kallsyms Read 4093 bytes from /proc/kallsyms Read 4076 bytes from /proc/kallsyms Read 4080 bytes from /proc/kallsyms Read 4086 bytes from /proc/kallsyms Read 4080 bytes from /proc/kallsyms Read 4064 bytes from /proc/kallsyms Read 4071 bytes from /proc/kallsyms Read 4063 bytes from /proc/kallsyms Read 4069 bytes from /proc/kallsyms Read 4079 bytes from /proc/kallsyms Read 4063 bytes from /proc/kallsyms Read 4072 bytes from /proc/kallsyms Read 4046 bytes from /proc/kallsyms Read 4091 bytes from /proc/kallsyms Read 4090 bytes from /proc/kallsyms Read 4067 bytes from /proc/kallsyms Read 4080 bytes from /proc/kallsyms Read 4066 bytes from /proc/kallsyms Read 4085 bytes from /proc/kallsyms Read 4095 bytes from /proc/kallsyms Read 4076 bytes from /proc/kallsyms Read 4090 bytes from /proc/kallsyms Read 4066 bytes from /proc/kallsyms Read 4073 bytes from /proc/kallsyms Read 4091 bytes from /proc/kallsyms Read 4075 bytes from /proc/kallsyms Read 4076 bytes from /proc/kallsyms Read 4048 bytes from /proc/kallsyms Read 4074 bytes from /proc/kallsyms Read 4058 bytes from /proc/kallsyms Read 4074 bytes from /proc/kallsyms [..] Read 4052 bytes from /proc/kallsyms Read 4061 bytes from /proc/kallsyms Read 4061 bytes from /proc/kallsyms Read 4053 bytes from /proc/kallsyms Read 4083 bytes from /proc/kallsyms Read 4066 bytes from /proc/kallsyms Read 4093 bytes from /proc/kallsyms Read 4072 bytes from /proc/kallsyms Read 1982 bytes from /proc/kallsyms Read 0 bytes from /proc/kallsyms You see, it requires multiple reads to pull in an entire kernel pseudo file. None of those reads are greater than PAGE_SIZE. Why should trace format files be any different? libtracefs handles this perfectly fine: https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git/tree/src/tracefs-utils.c#n343 Looks like you are trying to change the kernel to fix a user space bug :-/ NAK! -- Steve
>-----Original Message----- >From: Steven Rostedt <rostedt@goodmis.org> >Sent: 06 January 2025 17:22 >To: Shiju Jose <shiju.jose@huawei.com> >Cc: mhiramat@kernel.org; mathieu.desnoyers@efficios.com; linux-trace- >kernel@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm ><linuxarm@huawei.com>; Jonathan Cameron ><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>; >Zengtao (B) <prime.zeng@hisilicon.com> >Subject: Re: [PATCH 1/1] tracing: Support reading trace event format file larger >than PAGE_SIZE > >On Thu, 2 Jan 2025 17:43:17 +0000 ><shiju.jose@huawei.com> wrote: > >> From: Shiju Jose <shiju.jose@huawei.com> >> >> When userspace reads a trace event format file, the maximum data size >> that can be read is limited to PAGE_SIZE by the seq_read() and >> seq_read_iter() functions. This results in userspace receiving partial >> data if the format file is larger than PAGE_SIZE, requiring a >> workaround to read the complete data from the format file. >> >> Add support for reading trace event format files larger than PAGE_SIZE >> when needed by userspace. >> >> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> > >How is this an issue? This is common for all pseudo files and can be handled >properly from user space. > >Here, with this program: > >read.c: >-------------------------8<------------------------- >#include <stdlib.h> >#include <stdio.h> >#include <unistd.h> >#include <fcntl.h> >#include <sys/types.h> > >int main (int argc, char **argv) >{ > char *buf; > int fd; > off_t size; > int r, s; > > if (argc < 2) { > printf("usage: %s file\n", argv[0]); > exit(-1); > } > > fd = open(argv[1], O_RDONLY); > if (fd < 0) { > perror(argv[1]); > exit(-1); > } > > size = BUFSIZ * 10; > > buf = malloc(size); > for (s = 0, r = 1; r > 0; s += r) { > r = read(fd, buf, size); > if (r < 0) { > perror(argv[1]); > exit(-1); > } > printf("Read %d bytes from %s\n", r, argv[1]); > } > free(buf); > close(fd); > return 0; >} >------------------------->8------------------------- > > $ read /proc/kallsyms >Read 4091 bytes from /proc/kallsyms >Read 4075 bytes from /proc/kallsyms >Read 4078 bytes from /proc/kallsyms >Read 4083 bytes from /proc/kallsyms >Read 4093 bytes from /proc/kallsyms >Read 4076 bytes from /proc/kallsyms >Read 4080 bytes from /proc/kallsyms >Read 4086 bytes from /proc/kallsyms >Read 4080 bytes from /proc/kallsyms >Read 4064 bytes from /proc/kallsyms >Read 4071 bytes from /proc/kallsyms >Read 4063 bytes from /proc/kallsyms >Read 4069 bytes from /proc/kallsyms >Read 4079 bytes from /proc/kallsyms >Read 4063 bytes from /proc/kallsyms >Read 4072 bytes from /proc/kallsyms >Read 4046 bytes from /proc/kallsyms >Read 4091 bytes from /proc/kallsyms >Read 4090 bytes from /proc/kallsyms >Read 4067 bytes from /proc/kallsyms >Read 4080 bytes from /proc/kallsyms >Read 4066 bytes from /proc/kallsyms >Read 4085 bytes from /proc/kallsyms >Read 4095 bytes from /proc/kallsyms >Read 4076 bytes from /proc/kallsyms >Read 4090 bytes from /proc/kallsyms >Read 4066 bytes from /proc/kallsyms >Read 4073 bytes from /proc/kallsyms >Read 4091 bytes from /proc/kallsyms >Read 4075 bytes from /proc/kallsyms >Read 4076 bytes from /proc/kallsyms >Read 4048 bytes from /proc/kallsyms >Read 4074 bytes from /proc/kallsyms >Read 4058 bytes from /proc/kallsyms >Read 4074 bytes from /proc/kallsyms >[..] >Read 4052 bytes from /proc/kallsyms >Read 4061 bytes from /proc/kallsyms >Read 4061 bytes from /proc/kallsyms >Read 4053 bytes from /proc/kallsyms >Read 4083 bytes from /proc/kallsyms >Read 4066 bytes from /proc/kallsyms >Read 4093 bytes from /proc/kallsyms >Read 4072 bytes from /proc/kallsyms >Read 1982 bytes from /proc/kallsyms >Read 0 bytes from /proc/kallsyms > >You see, it requires multiple reads to pull in an entire kernel pseudo file. None of >those reads are greater than PAGE_SIZE. Why should trace format files be any >different? Thanks for the reply. Yes. I had a fix/workaround in the userspace rasdaemon with multiple reads like above as reported previously in the following thread. https://lore.kernel.org/lkml/3c9808a694d242cab35bab67602edebf@huawei.com/ However thought a solution in the common kernel code for the format file may be better. I will go ahead with the user space solution. Also shared an information in the above thread about libtraceevent __parse_event() does not return error when parse_format() fail with incomplete formt data, which resulted initialization for the trace event does not fail in the user space tool. >libtracefs handles this perfectly fine: > > https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git/tree/src/tracefs- >utils.c#n343 > >Looks like you are trying to change the kernel to fix a user space bug :-/ > > NAK! > >-- Steve Thanks, Shiju
On Mon, 6 Jan 2025 18:15:36 +0000 Shiju Jose <shiju.jose@huawei.com> wrote: > >You see, it requires multiple reads to pull in an entire kernel pseudo file. None of > >those reads are greater than PAGE_SIZE. Why should trace format files be any > >different? > Thanks for the reply. > Yes. I had a fix/workaround in the userspace rasdaemon with multiple reads like above > as reported previously in the following thread. > https://lore.kernel.org/lkml/3c9808a694d242cab35bab67602edebf@huawei.com/ > However thought a solution in the common kernel code for the format file > may be better. I will go ahead with the user space solution. That would make it different than every other pseudo file in the kernel. I rather not do that. > > Also shared an information in the above thread about libtraceevent __parse_event() does not > return error when parse_format() fail with incomplete formt data, which resulted initialization for the > trace event does not fail in the user space tool. Hmm, can you send me the the format file that failed? Just the output of "cat /sys/kernel/tracing/events/<system>/<event>/format" will do. -- Steve
>-----Original Message----- >From: Steven Rostedt <rostedt@goodmis.org> >Sent: 06 January 2025 22:12 >To: Shiju Jose <shiju.jose@huawei.com> >Cc: mhiramat@kernel.org; mathieu.desnoyers@efficios.com; linux-trace- >kernel@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm ><linuxarm@huawei.com>; Jonathan Cameron ><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>; >Zengtao (B) <prime.zeng@hisilicon.com> >Subject: Re: [PATCH 1/1] tracing: Support reading trace event format file larger >than PAGE_SIZE > >On Mon, 6 Jan 2025 18:15:36 +0000 >Shiju Jose <shiju.jose@huawei.com> wrote: > >> >You see, it requires multiple reads to pull in an entire kernel >> >pseudo file. None of those reads are greater than PAGE_SIZE. Why >> >should trace format files be any different? >> Thanks for the reply. >> Yes. I had a fix/workaround in the userspace rasdaemon with multiple >> reads like above as reported previously in the following thread. >> https://lore.kernel.org/lkml/3c9808a694d242cab35bab67602edebf@huawei.c >> om/ However thought a solution in the common kernel code for the >> format file may be better. I will go ahead with the user space >> solution. > >That would make it different than every other pseudo file in the kernel. I rather >not do that. > >> >> Also shared an information in the above thread about libtraceevent >> __parse_event() does not return error when parse_format() fail with >> incomplete formt data, which resulted initialization for the trace event does >not fail in the user space tool. > >Hmm, can you send me the the format file that failed? > >Just the output of "cat /sys/kernel/tracing/events/<system>/<event>/format" >will do. Please find attached format file, which might have failed to parse because of the unsupported formats in the libtraceevent you mentioned in the following thread. https://lore.kernel.org/lkml/20241127104132.6c1729e1@gandalf.local.home/#t Please find the improvement in the libraceevent which I mentioned for not returning error when parsing failed, =========================================== diff --git a/src/event-parse.c b/src/event-parse.c index 0427061..b9264cb 100644 --- a/src/event-parse.c +++ b/src/event-parse.c @@ -7905,9 +7905,14 @@ __parse_event(struct tep_handle *tep, const char *buf, unsigned long size, const char *sys) { - int ret = parse_format(eventp, tep, buf, size, sys); - struct tep_event *event = *eventp; + int ret; + struct tep_event *event; + + ret = parse_format(eventp, tep, buf, size, sys); + if (ret) + return ret; + event = *eventp; if (event == NULL) return ret; ========================================= > >-- Steve Thanks, Shiju
On Tue, 7 Jan 2025 11:04:57 +0000 Shiju Jose <shiju.jose@huawei.com> wrote: > Please find attached format file, which might have failed to parse because of the > unsupported formats in the libtraceevent you mentioned in the following thread. > https://lore.kernel.org/lkml/20241127104132.6c1729e1@gandalf.local.home/#t > > Please find the improvement in the libraceevent which I mentioned for not returning error > when parsing failed, > =========================================== > diff --git a/src/event-parse.c b/src/event-parse.c index 0427061..b9264cb 100644 > --- a/src/event-parse.c > +++ b/src/event-parse.c > @@ -7905,9 +7905,14 @@ __parse_event(struct tep_handle *tep, > const char *buf, unsigned long size, > const char *sys) > { > - int ret = parse_format(eventp, tep, buf, size, sys); > - struct tep_event *event = *eventp; > + int ret; > + struct tep_event *event; > + > + ret = parse_format(eventp, tep, buf, size, sys); > + if (ret) > + return ret; > > + event = *eventp; > if (event == NULL) > return ret; Actually, this is the proper patch: diff --git a/src/event-parse.c b/src/event-parse.c index 33ed7fb47fff..f2e50b0e8992 100644 --- a/src/event-parse.c +++ b/src/event-parse.c @@ -7841,7 +7841,7 @@ static enum tep_errno parse_format(struct tep_event **eventp, ret = event_read_format(event); if (ret < 0) { ret = TEP_ERRNO__READ_FORMAT_FAILED; - goto event_parse_failed; + goto event_alloc_failed; } /* As it's OK if it returns that it failed to parse the print format, as it can still read the event. But if it fails to read the fields, then it is basically useless. -- Steve
On Tue, 7 Jan 2025 18:02:24 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > diff --git a/src/event-parse.c b/src/event-parse.c > index 33ed7fb47fff..f2e50b0e8992 100644 > --- a/src/event-parse.c > +++ b/src/event-parse.c > @@ -7841,7 +7841,7 @@ static enum tep_errno parse_format(struct tep_event **eventp, > ret = event_read_format(event); > if (ret < 0) { > ret = TEP_ERRNO__READ_FORMAT_FAILED; > - goto event_parse_failed; > + goto event_alloc_failed; > } > > /* > > > As it's OK if it returns that it failed to parse the print format, as it > can still read the event. But if it fails to read the fields, then it is > basically useless. And the event you posted would still succeed. And it should succeed actually. It just failed to parse the print format. There's a lot of events that do as the print format is pretty much full C code, and libtraceveent is not a full C parser. It can also call functions that it doesn't know about which means even if it were a full C parser it would still fail to parse. But it does flag the event as "TEP_EVENT_FL_FAILED", and should display "[FAILED_TO_PARSE]" when printing the generic output. I still think failing to parse the fields is more of a critical error though. That shouldn't happen, and if it does, it should be corrected. -- Steve
>-----Original Message----- >From: Steven Rostedt <rostedt@goodmis.org> >Sent: 07 January 2025 23:02 >To: Shiju Jose <shiju.jose@huawei.com> >Cc: mhiramat@kernel.org; mathieu.desnoyers@efficios.com; linux-trace- >kernel@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm ><linuxarm@huawei.com>; Jonathan Cameron ><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>; >Zengtao (B) <prime.zeng@hisilicon.com> >Subject: Re: [PATCH 1/1] tracing: Support reading trace event format file larger >than PAGE_SIZE > >On Tue, 7 Jan 2025 11:04:57 +0000 >Shiju Jose <shiju.jose@huawei.com> wrote: > >> Please find attached format file, which might have failed to parse >> because of the unsupported formats in the libtraceevent you mentioned in the >following thread. >> >https://lore.kernel.org/lkml/20241127104132.6c1729e1@gandalf.local.home/#t >> >> Please find the improvement in the libraceevent which I mentioned for >> not returning error when parsing failed, >> =========================================== >> diff --git a/src/event-parse.c b/src/event-parse.c index >> 0427061..b9264cb 100644 >> --- a/src/event-parse.c >> +++ b/src/event-parse.c >> @@ -7905,9 +7905,14 @@ __parse_event(struct tep_handle *tep, >> const char *buf, unsigned long size, >> const char *sys) >> { >> - int ret = parse_format(eventp, tep, buf, size, sys); >> - struct tep_event *event = *eventp; >> + int ret; >> + struct tep_event *event; >> + >> + ret = parse_format(eventp, tep, buf, size, sys); >> + if (ret) >> + return ret; >> >> + event = *eventp; >> if (event == NULL) >> return ret; > >Actually, this is the proper patch: > >diff --git a/src/event-parse.c b/src/event-parse.c index >33ed7fb47fff..f2e50b0e8992 100644 >--- a/src/event-parse.c >+++ b/src/event-parse.c >@@ -7841,7 +7841,7 @@ static enum tep_errno parse_format(struct tep_event >**eventp, > ret = event_read_format(event); > if (ret < 0) { > ret = TEP_ERRNO__READ_FORMAT_FAILED; >- goto event_parse_failed; >+ goto event_alloc_failed; > } > > /* > > >As it's OK if it returns that it failed to parse the print format, as it can still read >the event. But if it fails to read the fields, then it is basically useless. Thanks. Your patch worked fine for returning error to the user space tool for the incomplete format data case as the following check become true in the __parse_event() after freeing 'event' in the goto event_alloc_failed. if (event == NULL) return ret; > >-- Steve Thanks, Shiju
>-----Original Message----- >From: Steven Rostedt <rostedt@goodmis.org> >Sent: 07 January 2025 23:19 >To: Shiju Jose <shiju.jose@huawei.com> >Cc: mhiramat@kernel.org; mathieu.desnoyers@efficios.com; linux-trace- >kernel@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm ><linuxarm@huawei.com>; Jonathan Cameron ><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>; >Zengtao (B) <prime.zeng@hisilicon.com> >Subject: Re: [PATCH 1/1] tracing: Support reading trace event format file larger >than PAGE_SIZE > >On Tue, 7 Jan 2025 18:02:24 -0500 >Steven Rostedt <rostedt@goodmis.org> wrote: > >> diff --git a/src/event-parse.c b/src/event-parse.c index >> 33ed7fb47fff..f2e50b0e8992 100644 >> --- a/src/event-parse.c >> +++ b/src/event-parse.c >> @@ -7841,7 +7841,7 @@ static enum tep_errno parse_format(struct >tep_event **eventp, >> ret = event_read_format(event); >> if (ret < 0) { >> ret = TEP_ERRNO__READ_FORMAT_FAILED; >> - goto event_parse_failed; >> + goto event_alloc_failed; >> } >> >> /* >> >> >> As it's OK if it returns that it failed to parse the print format, as >> it can still read the event. But if it fails to read the fields, then >> it is basically useless. > >And the event you posted would still succeed. And it should succeed actually. It >just failed to parse the print format. There's a lot of events that do as the print >format is pretty much full C code, and libtraceveent is not a full C parser. It can >also call functions that it doesn't know about which means even if it were a full C >parser it would still fail to parse. > >But it does flag the event as "TEP_EVENT_FL_FAILED", and should display >"[FAILED_TO_PARSE]" when printing the generic output. > Ok. >I still think failing to parse the fields is more of a critical error though. That >shouldn't happen, and if it does, it should be corrected. As reported previously, parsing fields succeed in the rasdaemon for CXL events, when format file is larger than PAGE_SIZE, with the rasdaemon change for reading complete format file. > >-- Steve Thanks, Shiju
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 1545cc8b49d0..ef33614e7f22 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1801,6 +1801,36 @@ static int trace_format_open(struct inode *inode, struct file *file) return 0; } +/** + * trace_format_read - read() method for format file. + * @file: the file to read from + * @buf: the buffer to read to + * @size: the maximum number of bytes to read + * @ppos: the current position in the file + * + * * Return: + * * %0 - No bytes copied (EOF). + * * %>0 - Number of bytes copied. + * * %<0 - Error code. + */ +static ssize_t trace_format_read(struct file *file, char __user *buf, size_t size, loff_t *ppos) +{ + ssize_t copied = 0; + ssize_t ret; + + do { + ret = seq_read(file, buf + copied, size - copied, ppos); + if (ret < 0) + return ret; + + copied += ret; + if (copied >= size) + break; + } while (ret); + + return copied; +} + #ifdef CONFIG_PERF_EVENTS static ssize_t event_id_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) @@ -2284,7 +2314,7 @@ static const struct file_operations ftrace_enable_fops = { static const struct file_operations ftrace_event_format_fops = { .open = trace_format_open, - .read = seq_read, + .read = trace_format_read, .llseek = seq_lseek, .release = seq_release, };