Message ID | 20250401225842.261475465@goodmis.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | tracing: Clean up persistent ring buffer code | expand |
On Tue, Apr 01, 2025 at 06:58:12PM -0400, Steven Rostedt wrote: > From: Steven Rostedt <rostedt@goodmis.org> > > Enforce that the address and the size of the memory used by the persistent > ring buffer is page aligned. Also update the documentation to reflect this > requirement. > > Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/ > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > Documentation/admin-guide/kernel-parameters.txt | 2 ++ > Documentation/trace/debugging.rst | 2 ++ > kernel/trace/trace.c | 12 ++++++++++++ > 3 files changed, 16 insertions(+) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 3435a062a208..f904fd8481bd 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -7266,6 +7266,8 @@ > This is just one of many ways that can clear memory. Make sure your system > keeps the content of memory across reboots before relying on this option. > > + NB: Both the mapped address and size must be page aligned for the architecture. > + > See also Documentation/trace/debugging.rst > > > diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst > index 54fb16239d70..d54bc500af80 100644 > --- a/Documentation/trace/debugging.rst > +++ b/Documentation/trace/debugging.rst > @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to work if the mapping is > preserved. Switching to a different kernel version may find a different > layout and mark the buffer as invalid. > > +NB: Both the mapped address and size must be page aligned for the architecture. > + > Using trace_printk() in the boot instance > ----------------------------------------- > By default, the content of trace_printk() goes into the top level tracing > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index de6d7f0e6206..de9c237e5826 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -10788,6 +10788,18 @@ __init static void enable_instances(void) > } > > if (start) { > + /* Start and size must be page aligned */ > + if (start & ~PAGE_MASK) { > + pr_warn("Tracing: mapping start addr %lx is not page aligned\n", > + (unsigned long)start); > + continue; > + } > + if (size & ~PAGE_MASK) { > + pr_warn("Tracing: mapping size %lx is not page aligned\n", > + (unsigned long)size); > + continue; > + } Better use %pa for printing physical address as on 32-bit systems phys_addr_t may be unsigned long long: pr_warn("Tracing: mapping size %pa is not page aligned\n", &size); > + > addr = map_pages(start, size); > if (addr) { > pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n", > -- > 2.47.2 > >
On Wed, 2 Apr 2025 12:21:49 +0300 Mike Rapoport <rppt@kernel.org> wrote: > > + if (size & ~PAGE_MASK) { > > + pr_warn("Tracing: mapping size %lx is not page aligned\n", > > + (unsigned long)size); > > + continue; > > + } > > Better use %pa for printing physical address as on 32-bit systems > phys_addr_t may be unsigned long long: > > pr_warn("Tracing: mapping size %pa is not page aligned\n", &size); Thanks, will update. -- Steve
On 2025-04-02 05:21, Mike Rapoport wrote: > On Tue, Apr 01, 2025 at 06:58:12PM -0400, Steven Rostedt wrote: >> From: Steven Rostedt <rostedt@goodmis.org> >> >> Enforce that the address and the size of the memory used by the persistent >> ring buffer is page aligned. Also update the documentation to reflect this >> requirement. I've been loosely following this thread, and I'm confused about one thing. AFAIU the goal is to have the ftrace persistent ring buffer written to through a memory range mapped by vmap_page_range(), and userspace maps the buffer with its own virtual mappings. With respect to architectures with aliasing dcache, is the plan: A) To make sure all persistent ring buffer mappings are aligned on SHMLBA: Quoting "Documentation/core-api/cachetlb.rst": Is your port susceptible to virtual aliasing in its D-cache? Well, if your D-cache is virtually indexed, is larger in size than PAGE_SIZE, and does not prevent multiple cache lines for the same physical address from existing at once, you have this problem. If your D-cache has this problem, first define asm/shmparam.h SHMLBA properly, it should essentially be the size of your virtually addressed D-cache (or if the size is variable, the largest possible size). This setting will force the SYSv IPC layer to only allow user processes to mmap shared memory at address which are a multiple of this value. or B) to flush both the kernel and userspace mappings when a ring buffer page is handed over from writer to reader ? I've seen both approaches being discussed in the recent threads, with some participants recommending approach (A), but then the code revisions that follow take approach (B). AFAIU, it we are aiming for approach (A), then I'm missing where vmap_page_range() guarantees that the _kernel_ virtual mapping is SHMLBA aligned. AFAIU, only user mappings are aligned on SHMLBA. And if we aiming towards approach (A), then the explicit flushing is not needed when handing over pages from writer to reader. Please let me know if I'm missing something, Thanks, Mathieu >> >> Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/ >> >> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> >> --- >> Documentation/admin-guide/kernel-parameters.txt | 2 ++ >> Documentation/trace/debugging.rst | 2 ++ >> kernel/trace/trace.c | 12 ++++++++++++ >> 3 files changed, 16 insertions(+) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index 3435a062a208..f904fd8481bd 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -7266,6 +7266,8 @@ >> This is just one of many ways that can clear memory. Make sure your system >> keeps the content of memory across reboots before relying on this option. >> >> + NB: Both the mapped address and size must be page aligned for the architecture. >> + >> See also Documentation/trace/debugging.rst >> >> >> diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst >> index 54fb16239d70..d54bc500af80 100644 >> --- a/Documentation/trace/debugging.rst >> +++ b/Documentation/trace/debugging.rst >> @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to work if the mapping is >> preserved. Switching to a different kernel version may find a different >> layout and mark the buffer as invalid. >> >> +NB: Both the mapped address and size must be page aligned for the architecture. >> + >> Using trace_printk() in the boot instance >> ----------------------------------------- >> By default, the content of trace_printk() goes into the top level tracing >> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >> index de6d7f0e6206..de9c237e5826 100644 >> --- a/kernel/trace/trace.c >> +++ b/kernel/trace/trace.c >> @@ -10788,6 +10788,18 @@ __init static void enable_instances(void) >> } >> >> if (start) { >> + /* Start and size must be page aligned */ >> + if (start & ~PAGE_MASK) { >> + pr_warn("Tracing: mapping start addr %lx is not page aligned\n", >> + (unsigned long)start); >> + continue; >> + } >> + if (size & ~PAGE_MASK) { >> + pr_warn("Tracing: mapping size %lx is not page aligned\n", >> + (unsigned long)size); >> + continue; >> + } > > Better use %pa for printing physical address as on 32-bit systems > phys_addr_t may be unsigned long long: > > pr_warn("Tracing: mapping size %pa is not page aligned\n", &size); > >> + >> addr = map_pages(start, size); >> if (addr) { >> pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n", >> -- >> 2.47.2 >> >> >
On 2025-04-02 11:01, Mathieu Desnoyers wrote: > On 2025-04-02 05:21, Mike Rapoport wrote: >> On Tue, Apr 01, 2025 at 06:58:12PM -0400, Steven Rostedt wrote: >>> From: Steven Rostedt <rostedt@goodmis.org> >>> >>> Enforce that the address and the size of the memory used by the >>> persistent >>> ring buffer is page aligned. Also update the documentation to reflect >>> this >>> requirement. > [ Please disregard this duplicate message and consider https://lore.kernel.org/lkml/c3e395d7-0c64-44d0-a0a7-57205b2ab712@efficios.com/T/#m461e6111397b33c037f6fb746ed74ffbd0a4340f instead. ] > I've been loosely following this thread, and I'm confused about one > thing. > > AFAIU the goal is to have the ftrace persistent ring buffer written to > through a memory range mapped by vmap_page_range(), and userspace maps > the buffer with its own virtual mappings. > > With respect to architectures with aliasing dcache, is the plan: > > A) To make sure all persistent ring buffer mappings are aligned on > SHMLBA: > > Quoting "Documentation/core-api/cachetlb.rst": > > Is your port susceptible to virtual aliasing in its D-cache? > Well, if your D-cache is virtually indexed, is larger in size than > PAGE_SIZE, and does not prevent multiple cache lines for the same > physical address from existing at once, you have this problem. > > If your D-cache has this problem, first define asm/shmparam.h SHMLBA > properly, it should essentially be the size of your virtually > addressed D-cache (or if the size is variable, the largest possible > size). This setting will force the SYSv IPC layer to only allow user > processes to mmap shared memory at address which are a multiple of > this value. > > or > > B) to flush both the kernel and userspace mappings when a ring buffer > page is handed over from writer to reader ? > > I've seen both approaches being discussed in the recent threads, with > some participants recommending approach (A), but then the code > revisions that follow take approach (B). > > AFAIU, it we are aiming for approach (A), then I'm missing where > vmap_page_range() guarantees that the _kernel_ virtual mapping is > SHMLBA aligned. AFAIU, only user mappings are aligned on SHMLBA. > > And if we aiming towards approach (A), then the explicit flushing > is not needed when handing over pages from writer to reader. > > Please let me know if I'm missing something, > > Thanks, > > Mathieu > >>> >>> Link: https://lore.kernel.org/all/CAHk- >>> =whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/ >>> >>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >>> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> >>> --- >>> Documentation/admin-guide/kernel-parameters.txt | 2 ++ >>> Documentation/trace/debugging.rst | 2 ++ >>> kernel/trace/trace.c | 12 ++++++++++++ >>> 3 files changed, 16 insertions(+) >>> >>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/ >>> Documentation/admin-guide/kernel-parameters.txt >>> index 3435a062a208..f904fd8481bd 100644 >>> --- a/Documentation/admin-guide/kernel-parameters.txt >>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>> @@ -7266,6 +7266,8 @@ >>> This is just one of many ways that can clear memory. >>> Make sure your system >>> keeps the content of memory across reboots before >>> relying on this option. >>> + NB: Both the mapped address and size must be page >>> aligned for the architecture. >>> + >>> See also Documentation/trace/debugging.rst >>> diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/ >>> debugging.rst >>> index 54fb16239d70..d54bc500af80 100644 >>> --- a/Documentation/trace/debugging.rst >>> +++ b/Documentation/trace/debugging.rst >>> @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to >>> work if the mapping is >>> preserved. Switching to a different kernel version may find a >>> different >>> layout and mark the buffer as invalid. >>> +NB: Both the mapped address and size must be page aligned for the >>> architecture. >>> + >>> Using trace_printk() in the boot instance >>> ----------------------------------------- >>> By default, the content of trace_printk() goes into the top level >>> tracing >>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >>> index de6d7f0e6206..de9c237e5826 100644 >>> --- a/kernel/trace/trace.c >>> +++ b/kernel/trace/trace.c >>> @@ -10788,6 +10788,18 @@ __init static void enable_instances(void) >>> } >>> if (start) { >>> + /* Start and size must be page aligned */ >>> + if (start & ~PAGE_MASK) { >>> + pr_warn("Tracing: mapping start addr %lx is not page >>> aligned\n", >>> + (unsigned long)start); >>> + continue; >>> + } >>> + if (size & ~PAGE_MASK) { >>> + pr_warn("Tracing: mapping size %lx is not page >>> aligned\n", >>> + (unsigned long)size); >>> + continue; >>> + } >> >> Better use %pa for printing physical address as on 32-bit systems >> phys_addr_t may be unsigned long long: >> >> pr_warn("Tracing: mapping size %pa is not page aligned\n", &size); >> >>> + >>> addr = map_pages(start, size); >>> if (addr) { >>> pr_info("Tracing: mapped boot instance %s at >>> physical memory %pa of size 0x%lx\n", >>> -- >>> 2.47.2 >>> >>> >> > >
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 3435a062a208..f904fd8481bd 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -7266,6 +7266,8 @@ This is just one of many ways that can clear memory. Make sure your system keeps the content of memory across reboots before relying on this option. + NB: Both the mapped address and size must be page aligned for the architecture. + See also Documentation/trace/debugging.rst diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst index 54fb16239d70..d54bc500af80 100644 --- a/Documentation/trace/debugging.rst +++ b/Documentation/trace/debugging.rst @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to work if the mapping is preserved. Switching to a different kernel version may find a different layout and mark the buffer as invalid. +NB: Both the mapped address and size must be page aligned for the architecture. + Using trace_printk() in the boot instance ----------------------------------------- By default, the content of trace_printk() goes into the top level tracing diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index de6d7f0e6206..de9c237e5826 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -10788,6 +10788,18 @@ __init static void enable_instances(void) } if (start) { + /* Start and size must be page aligned */ + if (start & ~PAGE_MASK) { + pr_warn("Tracing: mapping start addr %lx is not page aligned\n", + (unsigned long)start); + continue; + } + if (size & ~PAGE_MASK) { + pr_warn("Tracing: mapping size %lx is not page aligned\n", + (unsigned long)size); + continue; + } + addr = map_pages(start, size); if (addr) { pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n",