diff mbox series

perf/core: Add pmu stop before unthrottling to prevent WARNING

Message ID 20230602094841.1225-1-eric.lin@sifive.com (mailing list archive)
State Handled Elsewhere
Headers show
Series perf/core: Add pmu stop before unthrottling to prevent WARNING | expand

Checks

Context Check Description
conchuod/cover_letter success Single patches do not need cover letters
conchuod/tree_selection success Guessed tree name to be for-next at HEAD ac9a78681b92
conchuod/fixes_present success Fixes tag not required for -next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 6 and now 6
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 14 this patch: 14
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 71 this patch: 71
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 3 this patch: 3
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success No Fixes tag
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

Eric Lin June 2, 2023, 9:48 a.m. UTC
Currently, during the perf sampling, if the perf interrupt takes too long,
perf framework will lower the perf_event_max_sample_rate. This will limit
the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts
to MAX_INTERRUPTS within the __perf_event_account_interrupt() function.

Afterward, the perf framework will unthrottle the event in the timer interrupt
handler, which triggers the driver's *_pmu_start() function. Most of the driver's
*_pmu_start() functions will check the event->hw.state to determine whether this
event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning
will be triggered as shown below:

[ 2110.224723] ------------[ cut here ]------------
[ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
[ 2110.225242] Modules linked in:
[ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
[ 2110.225574] Hardware name: SiFive (DT)
[ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e
[ 2110.225834]  ra : riscv_pmu_start+0x28/0x8e
[ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
[ 2110.226135]  gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
[ 2110.226245]  t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
[ 2110.226367]  s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
[ 2110.226488]  a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
[ 2110.226605]  a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
[ 2110.226721]  s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
[ 2110.226838]  s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
[ 2110.226955]  s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
[ 2110.227064]  s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
[ 2110.227180]  t5 : 0000000700000000 t6 : ffffaf8005269870
[ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
[ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
[ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
[ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
[ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba
[ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
[ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
[ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
[ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
[ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
[ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
[ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36

To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling

Signed-off-by: Eric Lin <eric.lin@sifive.com>
---
 kernel/events/core.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Eric Lin June 21, 2023, 4:24 a.m. UTC | #1
CC: Stephane Eranian

On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote:
>
> Currently, during the perf sampling, if the perf interrupt takes too long,
> perf framework will lower the perf_event_max_sample_rate. This will limit
> the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts
> to MAX_INTERRUPTS within the __perf_event_account_interrupt() function.
>
> Afterward, the perf framework will unthrottle the event in the timer interrupt
> handler, which triggers the driver's *_pmu_start() function. Most of the driver's
> *_pmu_start() functions will check the event->hw.state to determine whether this
> event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning
> will be triggered as shown below:
>
> [ 2110.224723] ------------[ cut here ]------------
> [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
> [ 2110.225242] Modules linked in:
> [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
> [ 2110.225574] Hardware name: SiFive (DT)
> [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e
> [ 2110.225834]  ra : riscv_pmu_start+0x28/0x8e
> [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
> [ 2110.226135]  gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
> [ 2110.226245]  t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
> [ 2110.226367]  s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
> [ 2110.226488]  a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
> [ 2110.226605]  a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
> [ 2110.226721]  s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
> [ 2110.226838]  s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
> [ 2110.226955]  s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
> [ 2110.227064]  s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
> [ 2110.227180]  t5 : 0000000700000000 t6 : ffffaf8005269870
> [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
> [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
> [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
> [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
> [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba
> [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
> [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
> [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
> [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
> [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
> [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
> [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36
>
> To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling
>
> Signed-off-by: Eric Lin <eric.lin@sifive.com>
> ---
>  kernel/events/core.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index db016e418931..098c875abe88 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>
>                 if (hwc->interrupts == MAX_INTERRUPTS) {
>                         hwc->interrupts = 0;
> +                       event->pmu->stop(event, 0);
>                         perf_log_throttle(event, 1);
>                         event->pmu->start(event, 0);
>                 }
> --
> 2.17.1
>
Stephane Eranian June 21, 2023, 6:18 a.m. UTC | #2
On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote:
>
> CC: Stephane Eranian
>
> On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote:
> >
> > Currently, during the perf sampling, if the perf interrupt takes too long,
> > perf framework will lower the perf_event_max_sample_rate. This will limit
> > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts
> > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function.
> >
> > Afterward, the perf framework will unthrottle the event in the timer interrupt
> > handler, which triggers the driver's *_pmu_start() function. Most of the driver's
> > *_pmu_start() functions will check the event->hw.state to determine whether this
> > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning
> > will be triggered as shown below:
> >
> > [ 2110.224723] ------------[ cut here ]------------
> > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
> > [ 2110.225242] Modules linked in:
> > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
> > [ 2110.225574] Hardware name: SiFive (DT)
> > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e
> > [ 2110.225834]  ra : riscv_pmu_start+0x28/0x8e
> > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
> > [ 2110.226135]  gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
> > [ 2110.226245]  t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
> > [ 2110.226367]  s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
> > [ 2110.226488]  a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
> > [ 2110.226605]  a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
> > [ 2110.226721]  s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
> > [ 2110.226838]  s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
> > [ 2110.226955]  s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
> > [ 2110.227064]  s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
> > [ 2110.227180]  t5 : 0000000700000000 t6 : ffffaf8005269870
> > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
> > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
> > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
> > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
> > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba
> > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
> > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
> > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
> > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
> > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
> > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
> > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36
> >
> > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling
> >
> > Signed-off-by: Eric Lin <eric.lin@sifive.com>
> > ---
> >  kernel/events/core.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index db016e418931..098c875abe88 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
> >
> >                 if (hwc->interrupts == MAX_INTERRUPTS) {
> >                         hwc->interrupts = 0;
> > +                       event->pmu->stop(event, 0);

But how could the event have been stopped with a call to pmu->stop()
during throttling?

>
> >                         perf_log_throttle(event, 1);
> >                         event->pmu->start(event, 0);
> >                 }
> > --
> > 2.17.1
> >
Peter Zijlstra June 21, 2023, 11:58 a.m. UTC | #3
On Tue, Jun 20, 2023 at 11:18:05PM -0700, Stephane Eranian wrote:
> On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote:
> >
> > CC: Stephane Eranian
> >
> > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote:
> > >
> > > Currently, during the perf sampling, if the perf interrupt takes too long,
> > > perf framework will lower the perf_event_max_sample_rate. This will limit
> > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts
> > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function.
> > >
> > > Afterward, the perf framework will unthrottle the event in the timer interrupt
> > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's
> > > *_pmu_start() functions will check the event->hw.state to determine whether this
> > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning
> > > will be triggered as shown below:
> > >
> > > [ 2110.224723] ------------[ cut here ]------------
> > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
> > > [ 2110.225242] Modules linked in:
> > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
> > > [ 2110.225574] Hardware name: SiFive (DT)
> > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e
> > > [ 2110.225834]  ra : riscv_pmu_start+0x28/0x8e
> > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
> > > [ 2110.226135]  gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
> > > [ 2110.226245]  t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
> > > [ 2110.226367]  s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
> > > [ 2110.226488]  a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
> > > [ 2110.226605]  a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
> > > [ 2110.226721]  s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
> > > [ 2110.226838]  s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
> > > [ 2110.226955]  s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
> > > [ 2110.227064]  s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
> > > [ 2110.227180]  t5 : 0000000700000000 t6 : ffffaf8005269870
> > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
> > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
> > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
> > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
> > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba
> > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
> > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
> > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
> > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
> > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
> > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
> > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36
> > >
> > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling
> > >
> > > Signed-off-by: Eric Lin <eric.lin@sifive.com>
> > > ---
> > >  kernel/events/core.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index db016e418931..098c875abe88 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
> > >
> > >                 if (hwc->interrupts == MAX_INTERRUPTS) {
> > >                         hwc->interrupts = 0;
> > > +                       event->pmu->stop(event, 0);
> 
> But how could the event have been stopped with a call to pmu->stop()
> during throttling?

Yeah, Changelog fails to explain how we got to the faulty state -- and
without that we can't judge if the proposed solution actually fixes the
problem or not.

> >
> > >                         perf_log_throttle(event, 1);
> > >                         event->pmu->start(event, 0);
> > >                 }
> > > --
> > > 2.17.1
> > >
Eric Lin June 27, 2023, 9:03 a.m. UTC | #4
On Wed, Jun 21, 2023 at 01:58:11PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 11:18:05PM -0700, Stephane Eranian wrote:
> > On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote:
> > >
> > > CC: Stephane Eranian
> > >
> > > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote:
> > > >
> > > > Currently, during the perf sampling, if the perf interrupt takes too long,
> > > > perf framework will lower the perf_event_max_sample_rate. This will limit
> > > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts
> > > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function.
> > > >
> > > > Afterward, the perf framework will unthrottle the event in the timer interrupt
> > > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's
> > > > *_pmu_start() functions will check the event->hw.state to determine whether this
> > > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning
> > > > will be triggered as shown below:
> > > >
> > > > [ 2110.224723] ------------[ cut here ]------------
> > > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
> > > > [ 2110.225242] Modules linked in:
> > > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
> > > > [ 2110.225574] Hardware name: SiFive (DT)
> > > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e
> > > > [ 2110.225834]  ra : riscv_pmu_start+0x28/0x8e
> > > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
> > > > [ 2110.226135]  gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
> > > > [ 2110.226245]  t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
> > > > [ 2110.226367]  s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
> > > > [ 2110.226488]  a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
> > > > [ 2110.226605]  a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
> > > > [ 2110.226721]  s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
> > > > [ 2110.226838]  s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
> > > > [ 2110.226955]  s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
> > > > [ 2110.227064]  s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
> > > > [ 2110.227180]  t5 : 0000000700000000 t6 : ffffaf8005269870
> > > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
> > > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
> > > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
> > > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
> > > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba
> > > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
> > > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
> > > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
> > > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
> > > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
> > > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
> > > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36
> > > >
> > > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling
> > > >
> > > > Signed-off-by: Eric Lin <eric.lin@sifive.com>
> > > > ---
> > > >  kernel/events/core.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index db016e418931..098c875abe88 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
> > > >
> > > >                 if (hwc->interrupts == MAX_INTERRUPTS) {
> > > >                         hwc->interrupts = 0;
> > > > +                       event->pmu->stop(event, 0);
> > 
> > But how could the event have been stopped with a call to pmu->stop()
> > during throttling?
> 
> Yeah, Changelog fails to explain how we got to the faulty state -- and
> without that we can't judge if the proposed solution actually fixes the
> problem or not.
> 

Hi Stephane, Peter,

Most of the pmu driver will call *_pmu_stop(event,0) in the *_pmu_handle_irq() function and update the hwc->state with PERF_HES_STOPPED flag
as below:

arch/alpha/kernel/perf_event.c:856:             if (perf_event_overflow(event, &data, regs)) {
arch/alpha/kernel/perf_event.c-857-                     /* Interrupts coming too quickly; "throttle" the
arch/alpha/kernel/perf_event.c-858-                      * counter, i.e., disable it for a little while.
arch/alpha/kernel/perf_event.c-859-                      */
arch/alpha/kernel/perf_event.c-860-                     alpha_pmu_stop(event, 0);
arch/alpha/kernel/perf_event.c-861-             }
-----
arch/arc/kernel/perf_event.c:603:                       if (perf_event_overflow(event, &data, regs))
arch/arc/kernel/perf_event.c-604-                               arc_pmu_stop(event, 0);
arch/arc/kernel/perf_event.c-605-               }
-----
arch/x86/events/amd/core.c:935:         if (perf_event_overflow(event, &data, regs))
arch/x86/events/amd/core.c-936-                 x86_pmu_stop(event, 0);
arch/x86/events/amd/core.c-937- }
-----

However, some of the pmu drivers stop the event in the *_pmu_handle_irq() without update the hwc->state with PERF_HES_STOPPED flag
as below:

arch/arm/kernel/perf_event_v7.c:994:            if (perf_event_overflow(event, &data, regs))
arch/arm/kernel/perf_event_v7.c-995-                    cpu_pmu->disable(event); // <== not update with PERF_HES_STOPPED
arch/arm/kernel/perf_event_v7.c-996-    }
------
arch/csky/kernel/perf_event.c:1142:             if (perf_event_overflow(event, &data, regs))
arch/csky/kernel/perf_event.c-1143-                     csky_pmu_stop_event(event); // <== not update with PERF_HES_STOPPED
arch/csky/kernel/perf_event.c-1144-     }
-------
arch/loongarch/kernel/perf_event.c:492: if (perf_event_overflow(event, data, regs))
arch/loongarch/kernel/perf_event.c-493-         loongarch_pmu_disable_event(idx);    // <== not update with PERF_HES_STOPPED
arch/loongarch/kernel/perf_event.c-494-}
-------
arch/mips/kernel/perf_event_mipsxx.c:794:       if (perf_event_overflow(event, data, regs))
arch/mips/kernel/perf_event_mipsxx.c-795-               mipsxx_pmu_disable_event(idx); // <== not update with PERF_HES_STOPPED
arch/mips/kernel/perf_event_mipsxx.c-796-}
....

Furthermore, these drivers did not add event->hw.state checking in *_pmu_start() before starting the event like x86 does:

1497 static void x86_pmu_start(struct perf_event *event, int flags)
1498 {
1499     struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
1500     int idx = event->hw.idx;
1501
1502     if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
1503         return;
1504

As a result, these drivers won't trigger the WARN_ON_ONCE warning as shown in this patch.

However, if a pmu driver like RISC-V pmu which didn't call *_pmu_stop(event,0) without update the hwc->state with PERF_HES_STOPPED flag in the *_pmu_handle_irq() function
but has event->hw.state checking in *_pmu_start(), it could trigger the WARN_ON_ONCE warning as shown in this patch.

Therefore, I think we need to call pmu->stop() before unthrottling the event to prevent this warning.

Or maybe we can add hwc->state checking before we call pmu->stop() as below:

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4128,7 +4128,8 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)

                if (hwc->interrupts == MAX_INTERRUPTS) {
                        hwc->interrupts = 0;
-                       event->pmu->stop(event, 0);
+                       if (!(hwc->state & PERF_HES_STOPPED))
+                               event->pmu->stop(event, 0);
                        perf_log_throttle(event, 1);
                        event->pmu->start(event, 0);
                }

Thanks.

Best Regards,
Eric Lin.


> > >
> > > >                         perf_log_throttle(event, 1);
> > > >                         event->pmu->start(event, 0);
> > > >                 }
> > > > --
> > > > 2.17.1
> > > >
Eric Lin June 27, 2023, 9:08 a.m. UTC | #5
On Wed, Jun 21, 2023 at 01:58:11PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 11:18:05PM -0700, Stephane Eranian wrote:
> > On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote:
> > >
> > > CC: Stephane Eranian
> > >
> > > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote:
> > > >
> > > > Currently, during the perf sampling, if the perf interrupt takes too long,
> > > > perf framework will lower the perf_event_max_sample_rate. This will limit
> > > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts
> > > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function.
> > > >
> > > > Afterward, the perf framework will unthrottle the event in the timer interrupt
> > > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's
> > > > *_pmu_start() functions will check the event->hw.state to determine whether this
> > > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning
> > > > will be triggered as shown below:
> > > >
> > > > [ 2110.224723] ------------[ cut here ]------------
> > > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
> > > > [ 2110.225242] Modules linked in:
> > > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
> > > > [ 2110.225574] Hardware name: SiFive (DT)
> > > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e
> > > > [ 2110.225834]  ra : riscv_pmu_start+0x28/0x8e
> > > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
> > > > [ 2110.226135]  gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
> > > > [ 2110.226245]  t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
> > > > [ 2110.226367]  s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
> > > > [ 2110.226488]  a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
> > > > [ 2110.226605]  a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
> > > > [ 2110.226721]  s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
> > > > [ 2110.226838]  s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
> > > > [ 2110.226955]  s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
> > > > [ 2110.227064]  s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
> > > > [ 2110.227180]  t5 : 0000000700000000 t6 : ffffaf8005269870
> > > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
> > > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
> > > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
> > > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
> > > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba
> > > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
> > > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
> > > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
> > > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
> > > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
> > > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
> > > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36
> > > >
> > > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling
> > > >
> > > > Signed-off-by: Eric Lin <eric.lin@sifive.com>
> > > > ---
> > > >  kernel/events/core.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index db016e418931..098c875abe88 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
> > > >
> > > >                 if (hwc->interrupts == MAX_INTERRUPTS) {
> > > >                         hwc->interrupts = 0;
> > > > +                       event->pmu->stop(event, 0);
> >
> > But how could the event have been stopped with a call to pmu->stop()
> > during throttling?
>
> Yeah, Changelog fails to explain how we got to the faulty state -- and
> without that we can't judge if the proposed solution actually fixes the
> problem or not.
>

Hi Stephane, Peter,

Most of the pmu driver will call *_pmu_stop(event,0) in the
*_pmu_handle_irq() function and update the hwc->state with
PERF_HES_STOPPED flag as below:

arch/alpha/kernel/perf_event.c:856:             if
(perf_event_overflow(event, &data, regs)) {
arch/alpha/kernel/perf_event.c-857-                     /* Interrupts
coming too quickly; "throttle" the
arch/alpha/kernel/perf_event.c-858-                      * counter,
i.e., disable it for a little while.
arch/alpha/kernel/perf_event.c-859-                      */
arch/alpha/kernel/perf_event.c-860-
alpha_pmu_stop(event, 0);
arch/alpha/kernel/perf_event.c-861-             }
-----
arch/arc/kernel/perf_event.c:603:                       if
(perf_event_overflow(event, &data, regs))
arch/arc/kernel/perf_event.c-604-
arc_pmu_stop(event, 0);
arch/arc/kernel/perf_event.c-605-               }
-----
arch/x86/events/amd/core.c:935:         if (perf_event_overflow(event,
&data, regs))
arch/x86/events/amd/core.c-936-                 x86_pmu_stop(event, 0);
arch/x86/events/amd/core.c-937- }
-----

However, some of the pmu drivers stop the event in the
*_pmu_handle_irq() without updating the hwc->state with
PERF_HES_STOPPED flag as below:

arch/arm/kernel/perf_event_v7.c:994:            if
(perf_event_overflow(event, &data, regs))
arch/arm/kernel/perf_event_v7.c-995-
cpu_pmu->disable(event); // <== not update with PERF_HES_STOPPED
arch/arm/kernel/perf_event_v7.c-996-    }
------
arch/csky/kernel/perf_event.c:1142:             if
(perf_event_overflow(event, &data, regs))
arch/csky/kernel/perf_event.c-1143-
csky_pmu_stop_event(event); // <== not update with PERF_HES_STOPPED
arch/csky/kernel/perf_event.c-1144-     }
-------
arch/loongarch/kernel/perf_event.c:492: if (perf_event_overflow(event,
data, regs))
arch/loongarch/kernel/perf_event.c-493-
loongarch_pmu_disable_event(idx);    // <== not update with
PERF_HES_STOPPED
arch/loongarch/kernel/perf_event.c-494-}
-------
arch/mips/kernel/perf_event_mipsxx.c:794:       if
(perf_event_overflow(event, data, regs))
arch/mips/kernel/perf_event_mipsxx.c-795-
mipsxx_pmu_disable_event(idx); // <== not update with PERF_HES_STOPPED
arch/mips/kernel/perf_event_mipsxx.c-796-}
....

Furthermore, these drivers did not add event->hw.state checking in
*_pmu_start() before starting the event like x86 does:

1497 static void x86_pmu_start(struct perf_event *event, int flags)
1498 {
1499     struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
1500     int idx = event->hw.idx;
1501
1502     if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
1503         return;
1504

As a result, these drivers won't trigger the WARN_ON_ONCE warning as
shown in this patch.

However, if a pmu driver like RISC-V pmu which didn't call
*_pmu_stop(event,0) without updating the hwc->state with
PERF_HES_STOPPED flag in the *_pmu_handle_irq() function
but has event->hw.state checking in *_pmu_start(), it could trigger
the WARN_ON_ONCE warning as shown in this patch.

Therefore, I think we need to call pmu->stop() before unthrottling the
event to prevent this warning.

Or maybe we can add hwc->state checking before we call pmu->stop() as below:

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4128,7 +4128,8 @@ perf_adjust_freq_unthr_context(struct
perf_event_context *ctx, bool unthrottle)

                if (hwc->interrupts == MAX_INTERRUPTS) {
                        hwc->interrupts = 0;
-                       event->pmu->stop(event, 0);
+                       if (!(hwc->state & PERF_HES_STOPPED))
+                               event->pmu->stop(event, 0);
                        perf_log_throttle(event, 1);
                        event->pmu->start(event, 0);
                }

Thanks.

Best Regards,
Eric Lin.


> > >
> > > >                         perf_log_throttle(event, 1);
> > > >                         event->pmu->start(event, 0);
> > > >                 }
> > > > --
> > > > 2.17.1
> > > >
Peter Zijlstra June 27, 2023, 9:38 a.m. UTC | #6
On Tue, Jun 27, 2023 at 05:08:07PM +0800, Eric Lin wrote:

> > Yeah, Changelog fails to explain how we got to the faulty state -- and
> > without that we can't judge if the proposed solution actually fixes the
> > problem or not.
> >
> 
> Hi Stephane, Peter,
> 
> Most of the pmu driver will call *_pmu_stop(event,0) in the
> *_pmu_handle_irq() function and update the hwc->state with
> PERF_HES_STOPPED flag as below:
> 
> arch/alpha/kernel/perf_event.c:856:             if
> (perf_event_overflow(event, &data, regs)) {
> arch/alpha/kernel/perf_event.c-857-                     /* Interrupts
> coming too quickly; "throttle" the
> arch/alpha/kernel/perf_event.c-858-                      * counter,
> i.e., disable it for a little while.
> arch/alpha/kernel/perf_event.c-859-                      */
> arch/alpha/kernel/perf_event.c-860-
> alpha_pmu_stop(event, 0);
> arch/alpha/kernel/perf_event.c-861-             }
> -----
> arch/arc/kernel/perf_event.c:603:                       if
> (perf_event_overflow(event, &data, regs))
> arch/arc/kernel/perf_event.c-604-
> arc_pmu_stop(event, 0);
> arch/arc/kernel/perf_event.c-605-               }
> -----
> arch/x86/events/amd/core.c:935:         if (perf_event_overflow(event,
> &data, regs))
> arch/x86/events/amd/core.c-936-                 x86_pmu_stop(event, 0);
> arch/x86/events/amd/core.c-937- }
> -----
> 
> However, some of the pmu drivers stop the event in the
> *_pmu_handle_irq() without updating the hwc->state with
> PERF_HES_STOPPED flag as below:
> 
> arch/arm/kernel/perf_event_v7.c:994:            if
> (perf_event_overflow(event, &data, regs))
> arch/arm/kernel/perf_event_v7.c-995-
> cpu_pmu->disable(event); // <== not update with PERF_HES_STOPPED
> arch/arm/kernel/perf_event_v7.c-996-    }
> ------
> arch/csky/kernel/perf_event.c:1142:             if
> (perf_event_overflow(event, &data, regs))
> arch/csky/kernel/perf_event.c-1143-
> csky_pmu_stop_event(event); // <== not update with PERF_HES_STOPPED
> arch/csky/kernel/perf_event.c-1144-     }
> -------
> arch/loongarch/kernel/perf_event.c:492: if (perf_event_overflow(event,
> data, regs))
> arch/loongarch/kernel/perf_event.c-493-
> loongarch_pmu_disable_event(idx);    // <== not update with
> PERF_HES_STOPPED
> arch/loongarch/kernel/perf_event.c-494-}
> -------
> arch/mips/kernel/perf_event_mipsxx.c:794:       if
> (perf_event_overflow(event, data, regs))
> arch/mips/kernel/perf_event_mipsxx.c-795-
> mipsxx_pmu_disable_event(idx); // <== not update with PERF_HES_STOPPED
> arch/mips/kernel/perf_event_mipsxx.c-796-}
> ....
> 
> Furthermore, these drivers did not add event->hw.state checking in
> *_pmu_start() before starting the event like x86 does:
> 
> 1497 static void x86_pmu_start(struct perf_event *event, int flags)
> 1498 {
> 1499     struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> 1500     int idx = event->hw.idx;
> 1501
> 1502     if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> 1503         return;
> 1504
> 
> As a result, these drivers won't trigger the WARN_ON_ONCE warning as
> shown in this patch.
> 
> However, if a pmu driver like RISC-V pmu which didn't call
> *_pmu_stop(event,0) without updating the hwc->state with
> PERF_HES_STOPPED flag in the *_pmu_handle_irq() function
> but has event->hw.state checking in *_pmu_start(), it could trigger
> the WARN_ON_ONCE warning as shown in this patch.
> 
> Therefore, I think we need to call pmu->stop() before unthrottling the
> event to prevent this warning.

How is that not a pmu driver problem ? I'd think we should be fixing
those drivers. Mark, do you have have any memories of how the ARM driver
came to be this way?
diff mbox series

Patch

diff --git a/kernel/events/core.c b/kernel/events/core.c
index db016e418931..098c875abe88 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4128,6 +4128,7 @@  perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
 
 		if (hwc->interrupts == MAX_INTERRUPTS) {
 			hwc->interrupts = 0;
+			event->pmu->stop(event, 0);
 			perf_log_throttle(event, 1);
 			event->pmu->start(event, 0);
 		}