diff mbox series

[v5,1/4] tracing: Enforce the persistent ring buffer to be page aligned

Message ID 20250401225842.261475465@goodmis.org (mailing list archive)
State Superseded
Headers show
Series tracing: Clean up persistent ring buffer code | expand

Commit Message

Steven Rostedt April 1, 2025, 10:58 p.m. UTC
From: Steven Rostedt <rostedt@goodmis.org>

Enforce that the address and the size of the memory used by the persistent
ring buffer is page aligned. Also update the documentation to reflect this
requirement.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 Documentation/trace/debugging.rst               |  2 ++
 kernel/trace/trace.c                            | 12 ++++++++++++
 3 files changed, 16 insertions(+)

Comments

Mike Rapoport April 2, 2025, 9:21 a.m. UTC | #1
On Tue, Apr 01, 2025 at 06:58:12PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> Enforce that the address and the size of the memory used by the persistent
> ring buffer is page aligned. Also update the documentation to reflect this
> requirement.
> 
> Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  2 ++
>  Documentation/trace/debugging.rst               |  2 ++
>  kernel/trace/trace.c                            | 12 ++++++++++++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 3435a062a208..f904fd8481bd 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -7266,6 +7266,8 @@
>  			This is just one of many ways that can clear memory. Make sure your system
>  			keeps the content of memory across reboots before relying on this option.
>  
> +			NB: Both the mapped address and size must be page aligned for the architecture.
> +
>  			See also Documentation/trace/debugging.rst
>  
>  
> diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst
> index 54fb16239d70..d54bc500af80 100644
> --- a/Documentation/trace/debugging.rst
> +++ b/Documentation/trace/debugging.rst
> @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to work if the mapping is
>  preserved. Switching to a different kernel version may find a different
>  layout and mark the buffer as invalid.
>  
> +NB: Both the mapped address and size must be page aligned for the architecture.
> +
>  Using trace_printk() in the boot instance
>  -----------------------------------------
>  By default, the content of trace_printk() goes into the top level tracing
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index de6d7f0e6206..de9c237e5826 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -10788,6 +10788,18 @@ __init static void enable_instances(void)
>  		}
>  
>  		if (start) {
> +			/* Start and size must be page aligned */
> +			if (start & ~PAGE_MASK) {
> +				pr_warn("Tracing: mapping start addr %lx is not page aligned\n",
> +					(unsigned long)start);
> +				continue;
> +			}
> +			if (size & ~PAGE_MASK) {
> +				pr_warn("Tracing: mapping size %lx is not page aligned\n",
> +					(unsigned long)size);
> +				continue;
> +			}

Better use %pa for printing physical address as on 32-bit systems
phys_addr_t may be unsigned long long:

	pr_warn("Tracing: mapping size %pa is not page aligned\n", &size);

> +
>  			addr = map_pages(start, size);
>  			if (addr) {
>  				pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n",
> -- 
> 2.47.2
> 
>
Steven Rostedt April 2, 2025, 2:26 p.m. UTC | #2
On Wed, 2 Apr 2025 12:21:49 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> > +			if (size & ~PAGE_MASK) {
> > +				pr_warn("Tracing: mapping size %lx is not page aligned\n",
> > +					(unsigned long)size);
> > +				continue;
> > +			}  
> 
> Better use %pa for printing physical address as on 32-bit systems
> phys_addr_t may be unsigned long long:
> 
> 	pr_warn("Tracing: mapping size %pa is not page aligned\n", &size);

Thanks, will update.

-- Steve
Mathieu Desnoyers April 2, 2025, 3:01 p.m. UTC | #3
On 2025-04-02 05:21, Mike Rapoport wrote:
> On Tue, Apr 01, 2025 at 06:58:12PM -0400, Steven Rostedt wrote:
>> From: Steven Rostedt <rostedt@goodmis.org>
>>
>> Enforce that the address and the size of the memory used by the persistent
>> ring buffer is page aligned. Also update the documentation to reflect this
>> requirement.

I've been loosely following this thread, and I'm confused about one
thing.

AFAIU the goal is to have the ftrace persistent ring buffer written to
through a memory range mapped by vmap_page_range(), and userspace maps
the buffer with its own virtual mappings.

With respect to architectures with aliasing dcache, is the plan:

A) To make sure all persistent ring buffer mappings are aligned on
    SHMLBA:

Quoting "Documentation/core-api/cachetlb.rst":

   Is your port susceptible to virtual aliasing in its D-cache?
   Well, if your D-cache is virtually indexed, is larger in size than
   PAGE_SIZE, and does not prevent multiple cache lines for the same
   physical address from existing at once, you have this problem.

   If your D-cache has this problem, first define asm/shmparam.h SHMLBA
   properly, it should essentially be the size of your virtually
   addressed D-cache (or if the size is variable, the largest possible
   size).  This setting will force the SYSv IPC layer to only allow user
   processes to mmap shared memory at address which are a multiple of
   this value.

or

B) to flush both the kernel and userspace mappings when a ring buffer
    page is handed over from writer to reader ?

I've seen both approaches being discussed in the recent threads, with
some participants recommending approach (A), but then the code
revisions that follow take approach (B).

AFAIU, it we are aiming for approach (A), then I'm missing where
vmap_page_range() guarantees that the _kernel_ virtual mapping is
SHMLBA aligned. AFAIU, only user mappings are aligned on SHMLBA.

And if we aiming towards approach (A), then the explicit flushing
is not needed when handing over pages from writer to reader.

Please let me know if I'm missing something,

Thanks,

Mathieu

>>
>> Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/
>>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt |  2 ++
>>   Documentation/trace/debugging.rst               |  2 ++
>>   kernel/trace/trace.c                            | 12 ++++++++++++
>>   3 files changed, 16 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 3435a062a208..f904fd8481bd 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -7266,6 +7266,8 @@
>>   			This is just one of many ways that can clear memory. Make sure your system
>>   			keeps the content of memory across reboots before relying on this option.
>>   
>> +			NB: Both the mapped address and size must be page aligned for the architecture.
>> +
>>   			See also Documentation/trace/debugging.rst
>>   
>>   
>> diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst
>> index 54fb16239d70..d54bc500af80 100644
>> --- a/Documentation/trace/debugging.rst
>> +++ b/Documentation/trace/debugging.rst
>> @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to work if the mapping is
>>   preserved. Switching to a different kernel version may find a different
>>   layout and mark the buffer as invalid.
>>   
>> +NB: Both the mapped address and size must be page aligned for the architecture.
>> +
>>   Using trace_printk() in the boot instance
>>   -----------------------------------------
>>   By default, the content of trace_printk() goes into the top level tracing
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index de6d7f0e6206..de9c237e5826 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -10788,6 +10788,18 @@ __init static void enable_instances(void)
>>   		}
>>   
>>   		if (start) {
>> +			/* Start and size must be page aligned */
>> +			if (start & ~PAGE_MASK) {
>> +				pr_warn("Tracing: mapping start addr %lx is not page aligned\n",
>> +					(unsigned long)start);
>> +				continue;
>> +			}
>> +			if (size & ~PAGE_MASK) {
>> +				pr_warn("Tracing: mapping size %lx is not page aligned\n",
>> +					(unsigned long)size);
>> +				continue;
>> +			}
> 
> Better use %pa for printing physical address as on 32-bit systems
> phys_addr_t may be unsigned long long:
> 
> 	pr_warn("Tracing: mapping size %pa is not page aligned\n", &size);
> 
>> +
>>   			addr = map_pages(start, size);
>>   			if (addr) {
>>   				pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n",
>> -- 
>> 2.47.2
>>
>>
>
Mathieu Desnoyers April 2, 2025, 3:03 p.m. UTC | #4
On 2025-04-02 11:01, Mathieu Desnoyers wrote:
> On 2025-04-02 05:21, Mike Rapoport wrote:
>> On Tue, Apr 01, 2025 at 06:58:12PM -0400, Steven Rostedt wrote:
>>> From: Steven Rostedt <rostedt@goodmis.org>
>>>
>>> Enforce that the address and the size of the memory used by the 
>>> persistent
>>> ring buffer is page aligned. Also update the documentation to reflect 
>>> this
>>> requirement.
> 

[ Please disregard this duplicate message and consider
   https://lore.kernel.org/lkml/c3e395d7-0c64-44d0-a0a7-57205b2ab712@efficios.com/T/#m461e6111397b33c037f6fb746ed74ffbd0a4340f
   instead. ]

> I've been loosely following this thread, and I'm confused about one
> thing.
> 
> AFAIU the goal is to have the ftrace persistent ring buffer written to
> through a memory range mapped by vmap_page_range(), and userspace maps
> the buffer with its own virtual mappings.
> 
> With respect to architectures with aliasing dcache, is the plan:
> 
> A) To make sure all persistent ring buffer mappings are aligned on
>     SHMLBA:
> 
> Quoting "Documentation/core-api/cachetlb.rst":
> 
>    Is your port susceptible to virtual aliasing in its D-cache?
>    Well, if your D-cache is virtually indexed, is larger in size than
>    PAGE_SIZE, and does not prevent multiple cache lines for the same
>    physical address from existing at once, you have this problem.
> 
>    If your D-cache has this problem, first define asm/shmparam.h SHMLBA
>    properly, it should essentially be the size of your virtually
>    addressed D-cache (or if the size is variable, the largest possible
>    size).  This setting will force the SYSv IPC layer to only allow user
>    processes to mmap shared memory at address which are a multiple of
>    this value.
> 
> or
> 
> B) to flush both the kernel and userspace mappings when a ring buffer
>     page is handed over from writer to reader ?
> 
> I've seen both approaches being discussed in the recent threads, with
> some participants recommending approach (A), but then the code
> revisions that follow take approach (B).
> 
> AFAIU, it we are aiming for approach (A), then I'm missing where
> vmap_page_range() guarantees that the _kernel_ virtual mapping is
> SHMLBA aligned. AFAIU, only user mappings are aligned on SHMLBA.
> 
> And if we aiming towards approach (A), then the explicit flushing
> is not needed when handing over pages from writer to reader.
> 
> Please let me know if I'm missing something,
> 
> Thanks,
> 
> Mathieu
> 
>>>
>>> Link: https://lore.kernel.org/all/CAHk- 
>>> =whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/
>>>
>>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>>> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>>> ---
>>>   Documentation/admin-guide/kernel-parameters.txt |  2 ++
>>>   Documentation/trace/debugging.rst               |  2 ++
>>>   kernel/trace/trace.c                            | 12 ++++++++++++
>>>   3 files changed, 16 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/ 
>>> Documentation/admin-guide/kernel-parameters.txt
>>> index 3435a062a208..f904fd8481bd 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -7266,6 +7266,8 @@
>>>               This is just one of many ways that can clear memory. 
>>> Make sure your system
>>>               keeps the content of memory across reboots before 
>>> relying on this option.
>>> +            NB: Both the mapped address and size must be page 
>>> aligned for the architecture.
>>> +
>>>               See also Documentation/trace/debugging.rst
>>> diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/ 
>>> debugging.rst
>>> index 54fb16239d70..d54bc500af80 100644
>>> --- a/Documentation/trace/debugging.rst
>>> +++ b/Documentation/trace/debugging.rst
>>> @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to 
>>> work if the mapping is
>>>   preserved. Switching to a different kernel version may find a 
>>> different
>>>   layout and mark the buffer as invalid.
>>> +NB: Both the mapped address and size must be page aligned for the 
>>> architecture.
>>> +
>>>   Using trace_printk() in the boot instance
>>>   -----------------------------------------
>>>   By default, the content of trace_printk() goes into the top level 
>>> tracing
>>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>>> index de6d7f0e6206..de9c237e5826 100644
>>> --- a/kernel/trace/trace.c
>>> +++ b/kernel/trace/trace.c
>>> @@ -10788,6 +10788,18 @@ __init static void enable_instances(void)
>>>           }
>>>           if (start) {
>>> +            /* Start and size must be page aligned */
>>> +            if (start & ~PAGE_MASK) {
>>> +                pr_warn("Tracing: mapping start addr %lx is not page 
>>> aligned\n",
>>> +                    (unsigned long)start);
>>> +                continue;
>>> +            }
>>> +            if (size & ~PAGE_MASK) {
>>> +                pr_warn("Tracing: mapping size %lx is not page 
>>> aligned\n",
>>> +                    (unsigned long)size);
>>> +                continue;
>>> +            }
>>
>> Better use %pa for printing physical address as on 32-bit systems
>> phys_addr_t may be unsigned long long:
>>
>>     pr_warn("Tracing: mapping size %pa is not page aligned\n", &size);
>>
>>> +
>>>               addr = map_pages(start, size);
>>>               if (addr) {
>>>                   pr_info("Tracing: mapped boot instance %s at 
>>> physical memory %pa of size 0x%lx\n",
>>> -- 
>>> 2.47.2
>>>
>>>
>>
> 
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3435a062a208..f904fd8481bd 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7266,6 +7266,8 @@ 
 			This is just one of many ways that can clear memory. Make sure your system
 			keeps the content of memory across reboots before relying on this option.
 
+			NB: Both the mapped address and size must be page aligned for the architecture.
+
 			See also Documentation/trace/debugging.rst
 
 
diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst
index 54fb16239d70..d54bc500af80 100644
--- a/Documentation/trace/debugging.rst
+++ b/Documentation/trace/debugging.rst
@@ -136,6 +136,8 @@  kernel, so only the same kernel is guaranteed to work if the mapping is
 preserved. Switching to a different kernel version may find a different
 layout and mark the buffer as invalid.
 
+NB: Both the mapped address and size must be page aligned for the architecture.
+
 Using trace_printk() in the boot instance
 -----------------------------------------
 By default, the content of trace_printk() goes into the top level tracing
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index de6d7f0e6206..de9c237e5826 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10788,6 +10788,18 @@  __init static void enable_instances(void)
 		}
 
 		if (start) {
+			/* Start and size must be page aligned */
+			if (start & ~PAGE_MASK) {
+				pr_warn("Tracing: mapping start addr %lx is not page aligned\n",
+					(unsigned long)start);
+				continue;
+			}
+			if (size & ~PAGE_MASK) {
+				pr_warn("Tracing: mapping size %lx is not page aligned\n",
+					(unsigned long)size);
+				continue;
+			}
+
 			addr = map_pages(start, size);
 			if (addr) {
 				pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n",