diff mbox series

[v4,01/13] ring-buffer: Allow mapped field to be set without mapping

Message ID 20240611192907.402447387@goodmis.org (mailing list archive)
State Superseded
Headers show
Series tracing: Persistent traces across a reboot or crash | expand

Commit Message

Steven Rostedt June 11, 2024, 7:28 p.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

In preparation for having the ring buffer mapped to a dedicated location,
which will have the same restrictions as user space memory mapped buffers,
allow it to use the "mapped" field of the ring_buffer_per_cpu structure
without having the user space meta page mapping.

When this starts using the mapped field, it will need to handle adding a
user space mapping (and removing it) from a ring buffer that is using a
dedicated memory range.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Guenter Roeck June 11, 2024, 10:43 p.m. UTC | #1
On 6/11/24 12:28, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> In preparation for having the ring buffer mapped to a dedicated location,
> which will have the same restrictions as user space memory mapped buffers,
> allow it to use the "mapped" field of the ring_buffer_per_cpu structure
> without having the user space meta page mapping.
> 
> When this starts using the mapped field, it will need to handle adding a
> user space mapping (and removing it) from a ring buffer that is using a
> dedicated memory range.
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>   kernel/trace/ring_buffer.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 28853966aa9a..78beaccf9c8c 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -5224,6 +5224,9 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
>   {
>   	struct trace_buffer_meta *meta = cpu_buffer->meta_page;
>   
> +	if (!meta)
> +		return;
> +
>   	meta->reader.read = cpu_buffer->reader_page->read;
>   	meta->reader.id = cpu_buffer->reader_page->id;
>   	meta->reader.lost_events = cpu_buffer->lost_events;
> @@ -6167,7 +6170,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
>   
>   	mutex_lock(&cpu_buffer->mapping_lock);
>   
> -	if (!cpu_buffer->mapped) {
> +	if (!cpu_buffer->mapped || !cpu_buffer->meta_page) {
>   		mutex_unlock(&cpu_buffer->mapping_lock);
>   		return ERR_PTR(-ENODEV);
>   	}
> @@ -6359,12 +6362,13 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
>   	 */
>   	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>   	rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
> +

Picky again. Is that a leftover from something ? I don't see an immediate reason
for the added newline.

>   	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>   
>   	err = __rb_map_vma(cpu_buffer, vma);
>   	if (!err) {
>   		raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> -		cpu_buffer->mapped = 1;
> +		cpu_buffer->mapped++;
>   		raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>   	} else {
>   		kfree(cpu_buffer->subbuf_ids);
> @@ -6403,7 +6407,8 @@ int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
>   	mutex_lock(&buffer->mutex);
>   	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>   
> -	cpu_buffer->mapped = 0;
> +	WARN_ON_ONCE(!cpu_buffer->mapped);
> +	cpu_buffer->mapped--;

This will wrap to UINT_MAX if it was 0. Is that intentional ?

>   
>   	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>
Steven Rostedt June 11, 2024, 10:53 p.m. UTC | #2
On Tue, 11 Jun 2024 15:43:59 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> On 6/11/24 12:28, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > 
> > In preparation for having the ring buffer mapped to a dedicated location,
> > which will have the same restrictions as user space memory mapped buffers,
> > allow it to use the "mapped" field of the ring_buffer_per_cpu structure
> > without having the user space meta page mapping.
> > 
> > When this starts using the mapped field, it will need to handle adding a
> > user space mapping (and removing it) from a ring buffer that is using a
> > dedicated memory range.
> > 
> > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > ---
> >   kernel/trace/ring_buffer.c | 11 ++++++++---
> >   1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index 28853966aa9a..78beaccf9c8c 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -5224,6 +5224,9 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
> >   {
> >   	struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> >   
> > +	if (!meta)
> > +		return;
> > +
> >   	meta->reader.read = cpu_buffer->reader_page->read;
> >   	meta->reader.id = cpu_buffer->reader_page->id;
> >   	meta->reader.lost_events = cpu_buffer->lost_events;
> > @@ -6167,7 +6170,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
> >   
> >   	mutex_lock(&cpu_buffer->mapping_lock);
> >   
> > -	if (!cpu_buffer->mapped) {
> > +	if (!cpu_buffer->mapped || !cpu_buffer->meta_page) {
> >   		mutex_unlock(&cpu_buffer->mapping_lock);
> >   		return ERR_PTR(-ENODEV);
> >   	}
> > @@ -6359,12 +6362,13 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
> >   	 */
> >   	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> >   	rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
> > +  
> 
> Picky again. Is that a leftover from something ? I don't see an immediate reason
> for the added newline.

Hmm, I could remove it.

> 
> >   	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> >   
> >   	err = __rb_map_vma(cpu_buffer, vma);
> >   	if (!err) {
> >   		raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> > -		cpu_buffer->mapped = 1;
> > +		cpu_buffer->mapped++;
> >   		raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> >   	} else {
> >   		kfree(cpu_buffer->subbuf_ids);
> > @@ -6403,7 +6407,8 @@ int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
> >   	mutex_lock(&buffer->mutex);
> >   	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> >   
> > -	cpu_buffer->mapped = 0;
> > +	WARN_ON_ONCE(!cpu_buffer->mapped);
> > +	cpu_buffer->mapped--;  
> 
> This will wrap to UINT_MAX if it was 0. Is that intentional ?

If mapped is non zero, it limits what it can do. If it enters here as zero,
we are really in a unknown state, so yeah, wrapping will just keep it
limited. Which is a good thing.

Do you want me to add a comment there?

-- Steve


> 
> >   
> >   	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> >
Guenter Roeck June 11, 2024, 11:53 p.m. UTC | #3
On 6/11/24 15:53, Steven Rostedt wrote:
> On Tue, 11 Jun 2024 15:43:59 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> On 6/11/24 12:28, Steven Rostedt wrote:
>>> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>>>
>>> In preparation for having the ring buffer mapped to a dedicated location,
>>> which will have the same restrictions as user space memory mapped buffers,
>>> allow it to use the "mapped" field of the ring_buffer_per_cpu structure
>>> without having the user space meta page mapping.
>>>
>>> When this starts using the mapped field, it will need to handle adding a
>>> user space mapping (and removing it) from a ring buffer that is using a
>>> dedicated memory range.
>>>
>>> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>>> ---
>>>    kernel/trace/ring_buffer.c | 11 ++++++++---
>>>    1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
>>> index 28853966aa9a..78beaccf9c8c 100644
>>> --- a/kernel/trace/ring_buffer.c
>>> +++ b/kernel/trace/ring_buffer.c
>>> @@ -5224,6 +5224,9 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
>>>    {
>>>    	struct trace_buffer_meta *meta = cpu_buffer->meta_page;
>>>    
>>> +	if (!meta)
>>> +		return;
>>> +
>>>    	meta->reader.read = cpu_buffer->reader_page->read;
>>>    	meta->reader.id = cpu_buffer->reader_page->id;
>>>    	meta->reader.lost_events = cpu_buffer->lost_events;
>>> @@ -6167,7 +6170,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
>>>    
>>>    	mutex_lock(&cpu_buffer->mapping_lock);
>>>    
>>> -	if (!cpu_buffer->mapped) {
>>> +	if (!cpu_buffer->mapped || !cpu_buffer->meta_page) {
>>>    		mutex_unlock(&cpu_buffer->mapping_lock);
>>>    		return ERR_PTR(-ENODEV);
>>>    	}
>>> @@ -6359,12 +6362,13 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
>>>    	 */
>>>    	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>>>    	rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
>>> +
>>
>> Picky again. Is that a leftover from something ? I don't see an immediate reason
>> for the added newline.
> 
> Hmm, I could remove it.
> 
>>
>>>    	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>>>    
>>>    	err = __rb_map_vma(cpu_buffer, vma);
>>>    	if (!err) {
>>>    		raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>>> -		cpu_buffer->mapped = 1;
>>> +		cpu_buffer->mapped++;
>>>    		raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>>>    	} else {
>>>    		kfree(cpu_buffer->subbuf_ids);
>>> @@ -6403,7 +6407,8 @@ int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
>>>    	mutex_lock(&buffer->mutex);
>>>    	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>>>    
>>> -	cpu_buffer->mapped = 0;
>>> +	WARN_ON_ONCE(!cpu_buffer->mapped);
>>> +	cpu_buffer->mapped--;
>>
>> This will wrap to UINT_MAX if it was 0. Is that intentional ?
> 
> If mapped is non zero, it limits what it can do. If it enters here as zero,
> we are really in a unknown state, so yeah, wrapping will just keep it
> limited. Which is a good thing.
> 
> Do you want me to add a comment there?
> 

Maybe. I just wondered if something like
	if (!WARN_ON_ONCE(!cpu_buffer->mapped))
		cpu_buffer->mapped--;

would be better than wrapping because 'mapped' is used as flag elsewhere,
but then I can see that it is also manipulated in __rb_inc_dec_mapped(),
and that it is checked against UINT_MAX there (and not decremented if it is 0).

Maybe explain why sometimes __rb_inc_dec_mapped() is called to increment
or decrement ->mapped, and sometimes it id done directly ? I can see that
the function also acquires the buffer mutex, which isn't needed at the places
where mapped is incremented/decremented directly, but common code would
still be nice, and it is odd to see over/underflows handled sometimes but
not always.

Thanks,
Guenter
Steven Rostedt June 12, 2024, 1:39 a.m. UTC | #4
On Tue, 11 Jun 2024 16:53:43 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> >>> @@ -6403,7 +6407,8 @@ int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
> >>>    	mutex_lock(&buffer->mutex);
> >>>    	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> >>>    
> >>> -	cpu_buffer->mapped = 0;
> >>> +	WARN_ON_ONCE(!cpu_buffer->mapped);
> >>> +	cpu_buffer->mapped--;  
> >>
> >> This will wrap to UINT_MAX if it was 0. Is that intentional ?  
> > 
> > If mapped is non zero, it limits what it can do. If it enters here as zero,
> > we are really in a unknown state, so yeah, wrapping will just keep it
> > limited. Which is a good thing.
> > 
> > Do you want me to add a comment there?
> >   
> 
> Maybe. I just wondered if something like
> 	if (!WARN_ON_ONCE(!cpu_buffer->mapped))
> 		cpu_buffer->mapped--;
> 
> would be better than wrapping because 'mapped' is used as flag elsewhere,
> but then I can see that it is also manipulated in __rb_inc_dec_mapped(),
> and that it is checked against UINT_MAX there (and not decremented if it is 0).

Yeah, the __rb_inc_dec_mapped() is used as it is called when external
sources map the ring buffer. 

This is incremented and decremented internally. That is, we increment
it the first time the ring buffer is mapped, and decrement it again the
last time it is mapped.

I could add the above logic as well. I hit a bug in my more vigorous
testing so I need to make another revision anyway.

> 
> Maybe explain why sometimes __rb_inc_dec_mapped() is called to
> increment or decrement ->mapped, and sometimes it id done directly ?
> I can see that the function also acquires the buffer mutex, which
> isn't needed at the places where mapped is incremented/decremented
> directly, but common code would still be nice, and it is odd to see
> over/underflows handled sometimes but not always.

Sure. I'll add comments explaining more.

Thanks,

-- Steve
Steven Rostedt June 12, 2024, 1:57 a.m. UTC | #5
On Tue, 11 Jun 2024 21:39:37 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > 
> > Maybe explain why sometimes __rb_inc_dec_mapped() is called to
> > increment or decrement ->mapped, and sometimes it id done directly ?
> > I can see that the function also acquires the buffer mutex, which
> > isn't needed at the places where mapped is incremented/decremented
> > directly, but common code would still be nice, and it is odd to see
> > over/underflows handled sometimes but not always.  
> 
> Sure. I'll add comments explaining more.

And I found a bug with this code. It assumes that mapped will be equal
to 1 if it's the last mapping. That will no longer be the case.

-- Steve
diff mbox series

Patch

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 28853966aa9a..78beaccf9c8c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5224,6 +5224,9 @@  static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	struct trace_buffer_meta *meta = cpu_buffer->meta_page;
 
+	if (!meta)
+		return;
+
 	meta->reader.read = cpu_buffer->reader_page->read;
 	meta->reader.id = cpu_buffer->reader_page->id;
 	meta->reader.lost_events = cpu_buffer->lost_events;
@@ -6167,7 +6170,7 @@  rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
 
 	mutex_lock(&cpu_buffer->mapping_lock);
 
-	if (!cpu_buffer->mapped) {
+	if (!cpu_buffer->mapped || !cpu_buffer->meta_page) {
 		mutex_unlock(&cpu_buffer->mapping_lock);
 		return ERR_PTR(-ENODEV);
 	}
@@ -6359,12 +6362,13 @@  int ring_buffer_map(struct trace_buffer *buffer, int cpu,
 	 */
 	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 	rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
+
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	err = __rb_map_vma(cpu_buffer, vma);
 	if (!err) {
 		raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-		cpu_buffer->mapped = 1;
+		cpu_buffer->mapped++;
 		raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 	} else {
 		kfree(cpu_buffer->subbuf_ids);
@@ -6403,7 +6407,8 @@  int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
 	mutex_lock(&buffer->mutex);
 	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
-	cpu_buffer->mapped = 0;
+	WARN_ON_ONCE(!cpu_buffer->mapped);
+	cpu_buffer->mapped--;
 
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);