Message ID | 20230426010446.10753-1-Tze-nan.Wu@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5] ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus | expand |
For some reason, this email did not make it to linux-trace-kernel@vger.kernel.org, and therefore did not make it into patchwork? John? -- Steve On Wed, 26 Apr 2023 09:04:44 +0800 Tze-nan.Wu <Tze-nan.Wu@mediatek.com> wrote: > From: "Tze-nan Wu" <Tze-nan.Wu@mediatek.com> > > In ring_buffer_reset_online_cpus, the buffer_size_kb write operation > may permanently fail if the cpu_online_mask changes between two > for_each_online_buffer_cpu loops. The number of increases and decreases > on both cpu_buffer->resize_disabled and cpu_buffer->record_disabled may be > inconsistent, causing some CPUs to have non-zero values for these atomic > variables after the function returns. > > This issue can be reproduced by "echo 0 > trace" while hotplugging cpu. > After reproducing success, we can find out buffer_size_kb will not be > functional anymore. > > To prevent leaving 'resize_disabled' and 'record_disabled' non-zero after > ring_buffer_reset_online_cpus returns, we ensure that each atomic variable > has been set up before atomic_sub() to it. > > Cc: stable@vger.kernel.org > Cc: npiggin@gmail.com > Fixes: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") > Reviewed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> > --- > Changes from v4 to v5: https://lore.kernel.org/lkml/20230412112401.25081-1-Tze-nan.Wu@mediatek.com/ > - Move the define before the function > --- > kernel/trace/ring_buffer.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 76a2d91eecad..253ef85a9ec3 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -5345,6 +5345,9 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu) > } > EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu); > > +/* Flag to ensure proper resetting of atomic variables */ > +#define RESET_BIT (1 << 30) > + > /** > * ring_buffer_reset_online_cpus - reset a ring buffer per CPU buffer > * @buffer: The ring buffer to reset a per cpu buffer of > @@ -5361,20 +5364,27 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer) > for_each_online_buffer_cpu(buffer, cpu) { > cpu_buffer = buffer->buffers[cpu]; > > - atomic_inc(&cpu_buffer->resize_disabled); > + atomic_add(RESET_BIT, &cpu_buffer->resize_disabled); > atomic_inc(&cpu_buffer->record_disabled); > } > > /* Make sure all commits have finished */ > synchronize_rcu(); > > - for_each_online_buffer_cpu(buffer, cpu) { > + for_each_buffer_cpu(buffer, cpu) { > cpu_buffer = buffer->buffers[cpu]; > > + /* > + * If a CPU came online during the synchronize_rcu(), then > + * ignore it. > + */ > + if (!(atomic_read(&cpu_buffer->resize_disabled) & RESET_BIT)) > + continue; > + > reset_disabled_cpu_buffer(cpu_buffer); > > atomic_dec(&cpu_buffer->record_disabled); > - atomic_dec(&cpu_buffer->resize_disabled); > + atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled); > } > > mutex_unlock(&buffer->mutex);
On Tue, 25 Apr 2023 21:17:37 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > For some reason, this email did not make it to > linux-trace-kernel@vger.kernel.org, and therefore did not make it into > patchwork? > And the email was definitely Cc'd properly, as my reply made it to lore, but not the original email. https://lore.kernel.org/linux-trace-kernel/20230425211737.757208b3@gandalf.local.home/ -- Steve
Ran afoul of the taboo filters: Illegal-Object: Syntax error in From: address found on vger.kernel.org: From: Tze-nan.Wu<Tze-nan.Wu@mediatek.com> ^-missing end of mailbox I've seen a few of these but I've been unable to replicate the problem. I've a suspicion but so far been unable to prove the theory on what's wrong. - John On 4/25/23 18:17, Steven Rostedt wrote: > > For some reason, this email did not make it to > linux-trace-kernel@vger.kernel.org, and therefore did not make it into > patchwork? > > John? > > -- Steve > > > On Wed, 26 Apr 2023 09:04:44 +0800 > Tze-nan.Wu <Tze-nan.Wu@mediatek.com> wrote: > >> From: "Tze-nan Wu" <Tze-nan.Wu@mediatek.com> >> >> In ring_buffer_reset_online_cpus, the buffer_size_kb write operation >> may permanently fail if the cpu_online_mask changes between two >> for_each_online_buffer_cpu loops. The number of increases and decreases >> on both cpu_buffer->resize_disabled and cpu_buffer->record_disabled may be >> inconsistent, causing some CPUs to have non-zero values for these atomic >> variables after the function returns. >> >> This issue can be reproduced by "echo 0 > trace" while hotplugging cpu. >> After reproducing success, we can find out buffer_size_kb will not be >> functional anymore. >> >> To prevent leaving 'resize_disabled' and 'record_disabled' non-zero after >> ring_buffer_reset_online_cpus returns, we ensure that each atomic variable >> has been set up before atomic_sub() to it. >> >> Cc: stable@vger.kernel.org >> Cc: npiggin@gmail.com >> Fixes: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") >> Reviewed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> >> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> >> --- >> Changes from v4 to v5: https://lore.kernel.org/lkml/20230412112401.25081-1-Tze-nan.Wu@mediatek.com/ >> - Move the define before the function >> --- >> kernel/trace/ring_buffer.c | 16 +++++++++++++--- >> 1 file changed, 13 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c >> index 76a2d91eecad..253ef85a9ec3 100644 >> --- a/kernel/trace/ring_buffer.c >> +++ b/kernel/trace/ring_buffer.c >> @@ -5345,6 +5345,9 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu) >> } >> EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu); >> >> +/* Flag to ensure proper resetting of atomic variables */ >> +#define RESET_BIT (1 << 30) >> + >> /** >> * ring_buffer_reset_online_cpus - reset a ring buffer per CPU buffer >> * @buffer: The ring buffer to reset a per cpu buffer of >> @@ -5361,20 +5364,27 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer) >> for_each_online_buffer_cpu(buffer, cpu) { >> cpu_buffer = buffer->buffers[cpu]; >> >> - atomic_inc(&cpu_buffer->resize_disabled); >> + atomic_add(RESET_BIT, &cpu_buffer->resize_disabled); >> atomic_inc(&cpu_buffer->record_disabled); >> } >> >> /* Make sure all commits have finished */ >> synchronize_rcu(); >> >> - for_each_online_buffer_cpu(buffer, cpu) { >> + for_each_buffer_cpu(buffer, cpu) { >> cpu_buffer = buffer->buffers[cpu]; >> >> + /* >> + * If a CPU came online during the synchronize_rcu(), then >> + * ignore it. >> + */ >> + if (!(atomic_read(&cpu_buffer->resize_disabled) & RESET_BIT)) >> + continue; >> + >> reset_disabled_cpu_buffer(cpu_buffer); >> >> atomic_dec(&cpu_buffer->record_disabled); >> - atomic_dec(&cpu_buffer->resize_disabled); >> + atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled); >> } >> >> mutex_unlock(&buffer->mutex); >
On Tue, 2023-04-25 at 19:58 -0700, John 'Warthog9' Hawley wrote: > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > > Ran afoul of the taboo filters: > > Illegal-Object: Syntax error in From: address found on > vger.kernel.org: > From: Tze-nan.Wu<Tze-nan.Wu@mediatek.com> > ^-missing end of mailbox > > I've seen a few of these but I've been unable to replicate the > problem. > I've a suspicion but so far been unable to prove the theory on what's > wrong. > > - John > So I sent the patch again, but this time I changed the name field in .gitconfig from "Tze-nan.Wu"(internal-use) to "Tze-nan Wu". This can avoid the line "From: Tze-nan.Wu<Tze-nan.Wu@mediatek.com>" being added to the mail when we execute "git send-email", and it works, the last patch I sent is not filter out anymore. --- Here is the magore difference between the patches: old patch(fail): From: "Tze-nan Wu" <Tze-nan.Wu@mediatek.com> <== (modify by editor from Tze-nan.Wu to Tze-nan Wu) --- new patch(success): From: Tze-nan Wu <Tze-nan.Wu@mediatek.com> --- - Tzenan > On 4/25/23 18:17, Steven Rostedt wrote: > > > > For some reason, this email did not make it to > > linux-trace-kernel@vger.kernel.org, and therefore did not make it > > into > > patchwork? > > > > John? > > > > -- Steve > > > > > > On Wed, 26 Apr 2023 09:04:44 +0800 > > Tze-nan.Wu <Tze-nan.Wu@mediatek.com> wrote: > > > > > From: "Tze-nan Wu" <Tze-nan.Wu@mediatek.com> > > > > > > In ring_buffer_reset_online_cpus, the buffer_size_kb write > > > operation > > > may permanently fail if the cpu_online_mask changes between two > > > for_each_online_buffer_cpu loops. The number of increases and > > > decreases > > > on both cpu_buffer->resize_disabled and cpu_buffer- > > > >record_disabled may be > > > inconsistent, causing some CPUs to have non-zero values for these > > > atomic > > > variables after the function returns. > > > > > > This issue can be reproduced by "echo 0 > trace" while > > > hotplugging cpu. > > > After reproducing success, we can find out buffer_size_kb will > > > not be > > > functional anymore. > > > > > > To prevent leaving 'resize_disabled' and 'record_disabled' non- > > > zero after > > > ring_buffer_reset_online_cpus returns, we ensure that each atomic > > > variable > > > has been set up before atomic_sub() to it. > > > > > > Cc: stable@vger.kernel.org > > > Cc: npiggin@gmail.com > > > Fixes: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by > > > avoiding synchronize_rcu for each CPU") > > > Reviewed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> > > > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> > > > --- > > > Changes from v4 to v5: > > > https://lore.kernel.org/lkml/20230412112401.25081-1-Tze-nan.Wu@mediatek.com/ > > > - Move the define before the function > > > --- > > > kernel/trace/ring_buffer.c | 16 +++++++++++++--- > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > diff --git a/kernel/trace/ring_buffer.c > > > b/kernel/trace/ring_buffer.c > > > index 76a2d91eecad..253ef85a9ec3 100644 > > > --- a/kernel/trace/ring_buffer.c > > > +++ b/kernel/trace/ring_buffer.c > > > @@ -5345,6 +5345,9 @@ void ring_buffer_reset_cpu(struct > > > trace_buffer *buffer, int cpu) > > > } > > > EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu); > > > > > > +/* Flag to ensure proper resetting of atomic variables */ > > > +#define RESET_BIT (1 << 30) > > > + > > > /** > > > * ring_buffer_reset_online_cpus - reset a ring buffer per CPU > > > buffer > > > * @buffer: The ring buffer to reset a per cpu buffer of > > > @@ -5361,20 +5364,27 @@ void ring_buffer_reset_online_cpus(struct > > > trace_buffer *buffer) > > > for_each_online_buffer_cpu(buffer, cpu) { > > > cpu_buffer = buffer->buffers[cpu]; > > > > > > - atomic_inc(&cpu_buffer->resize_disabled); > > > + atomic_add(RESET_BIT, &cpu_buffer->resize_disabled); > > > atomic_inc(&cpu_buffer->record_disabled); > > > } > > > > > > /* Make sure all commits have finished */ > > > synchronize_rcu(); > > > > > > - for_each_online_buffer_cpu(buffer, cpu) { > > > + for_each_buffer_cpu(buffer, cpu) { > > > cpu_buffer = buffer->buffers[cpu]; > > > > > > + /* > > > + * If a CPU came online during the > > > synchronize_rcu(), then > > > + * ignore it. > > > + */ > > > + if (!(atomic_read(&cpu_buffer->resize_disabled) & > > > RESET_BIT)) > > > + continue; > > > + > > > reset_disabled_cpu_buffer(cpu_buffer); > > > > > > atomic_dec(&cpu_buffer->record_disabled); > > > - atomic_dec(&cpu_buffer->resize_disabled); > > > + atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled); > > > } > > > > > > mutex_unlock(&buffer->mutex);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 76a2d91eecad..253ef85a9ec3 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -5345,6 +5345,9 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu) } EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu); +/* Flag to ensure proper resetting of atomic variables */ +#define RESET_BIT (1 << 30) + /** * ring_buffer_reset_online_cpus - reset a ring buffer per CPU buffer * @buffer: The ring buffer to reset a per cpu buffer of @@ -5361,20 +5364,27 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer) for_each_online_buffer_cpu(buffer, cpu) { cpu_buffer = buffer->buffers[cpu]; - atomic_inc(&cpu_buffer->resize_disabled); + atomic_add(RESET_BIT, &cpu_buffer->resize_disabled); atomic_inc(&cpu_buffer->record_disabled); } /* Make sure all commits have finished */ synchronize_rcu(); - for_each_online_buffer_cpu(buffer, cpu) { + for_each_buffer_cpu(buffer, cpu) { cpu_buffer = buffer->buffers[cpu]; + /* + * If a CPU came online during the synchronize_rcu(), then + * ignore it. + */ + if (!(atomic_read(&cpu_buffer->resize_disabled) & RESET_BIT)) + continue; + reset_disabled_cpu_buffer(cpu_buffer); atomic_dec(&cpu_buffer->record_disabled); - atomic_dec(&cpu_buffer->resize_disabled); + atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled); } mutex_unlock(&buffer->mutex);