Message ID | 20230602094841.1225-1-eric.lin@sifive.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | perf/core: Add pmu stop before unthrottling to prevent WARNING | expand |
Context | Check | Description |
---|---|---|
conchuod/cover_letter | success | Single patches do not need cover letters |
conchuod/tree_selection | success | Guessed tree name to be for-next at HEAD ac9a78681b92 |
conchuod/fixes_present | success | Fixes tag not required for -next series |
conchuod/maintainers_pattern | success | MAINTAINERS pattern errors before the patch: 6 and now 6 |
conchuod/verify_signedoff | success | Signed-off-by tag matches author and committer |
conchuod/kdoc | success | Errors and warnings before: 0 this patch: 0 |
conchuod/build_rv64_clang_allmodconfig | success | Errors and warnings before: 14 this patch: 14 |
conchuod/module_param | success | Was 0 now: 0 |
conchuod/build_rv64_gcc_allmodconfig | success | Errors and warnings before: 71 this patch: 71 |
conchuod/build_rv32_defconfig | success | Build OK |
conchuod/dtb_warn_rv64 | success | Errors and warnings before: 3 this patch: 3 |
conchuod/header_inline | success | No static functions without inline keyword in header files |
conchuod/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 7 lines checked |
conchuod/build_rv64_nommu_k210_defconfig | success | Build OK |
conchuod/verify_fixes | success | No Fixes tag |
conchuod/build_rv64_nommu_virt_defconfig | success | Build OK |
CC: Stephane Eranian On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote: > > Currently, during the perf sampling, if the perf interrupt takes too long, > perf framework will lower the perf_event_max_sample_rate. This will limit > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function. > > Afterward, the perf framework will unthrottle the event in the timer interrupt > handler, which triggers the driver's *_pmu_start() function. Most of the driver's > *_pmu_start() functions will check the event->hw.state to determine whether this > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning > will be triggered as shown below: > > [ 2110.224723] ------------[ cut here ]------------ > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e > [ 2110.225242] Modules linked in: > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1 > [ 2110.225574] Hardware name: SiFive (DT) > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e > [ 2110.225834] ra : riscv_pmu_start+0x28/0x8e > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0 > [ 2110.226135] gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0 > [ 2110.226245] t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720 > [ 2110.226367] s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000 > [ 2110.226488] a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000 > [ 2110.226605] a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030 > [ 2110.226721] s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00 > [ 2110.226838] s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000 > [ 2110.226955] s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340 > [ 2110.227064] s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796 > [ 2110.227180] t5 : 0000000700000000 t6 : ffffaf8005269870 > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174 > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66 > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0 > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4 > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42 > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2 > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36 > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling > > Signed-off-by: Eric Lin <eric.lin@sifive.com> > --- > kernel/events/core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index db016e418931..098c875abe88 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > if (hwc->interrupts == MAX_INTERRUPTS) { > hwc->interrupts = 0; > + event->pmu->stop(event, 0); > perf_log_throttle(event, 1); > event->pmu->start(event, 0); > } > -- > 2.17.1 >
On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote: > > CC: Stephane Eranian > > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote: > > > > Currently, during the perf sampling, if the perf interrupt takes too long, > > perf framework will lower the perf_event_max_sample_rate. This will limit > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function. > > > > Afterward, the perf framework will unthrottle the event in the timer interrupt > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's > > *_pmu_start() functions will check the event->hw.state to determine whether this > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning > > will be triggered as shown below: > > > > [ 2110.224723] ------------[ cut here ]------------ > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e > > [ 2110.225242] Modules linked in: > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1 > > [ 2110.225574] Hardware name: SiFive (DT) > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e > > [ 2110.225834] ra : riscv_pmu_start+0x28/0x8e > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0 > > [ 2110.226135] gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0 > > [ 2110.226245] t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720 > > [ 2110.226367] s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000 > > [ 2110.226488] a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000 > > [ 2110.226605] a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030 > > [ 2110.226721] s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00 > > [ 2110.226838] s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000 > > [ 2110.226955] s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340 > > [ 2110.227064] s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796 > > [ 2110.227180] t5 : 0000000700000000 t6 : ffffaf8005269870 > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174 > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66 > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0 > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4 > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42 > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2 > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36 > > > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling > > > > Signed-off-by: Eric Lin <eric.lin@sifive.com> > > --- > > kernel/events/core.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index db016e418931..098c875abe88 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > > > if (hwc->interrupts == MAX_INTERRUPTS) { > > hwc->interrupts = 0; > > + event->pmu->stop(event, 0); But how could the event have been stopped with a call to pmu->stop() during throttling? > > > perf_log_throttle(event, 1); > > event->pmu->start(event, 0); > > } > > -- > > 2.17.1 > >
On Tue, Jun 20, 2023 at 11:18:05PM -0700, Stephane Eranian wrote: > On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote: > > > > CC: Stephane Eranian > > > > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote: > > > > > > Currently, during the perf sampling, if the perf interrupt takes too long, > > > perf framework will lower the perf_event_max_sample_rate. This will limit > > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts > > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function. > > > > > > Afterward, the perf framework will unthrottle the event in the timer interrupt > > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's > > > *_pmu_start() functions will check the event->hw.state to determine whether this > > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning > > > will be triggered as shown below: > > > > > > [ 2110.224723] ------------[ cut here ]------------ > > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e > > > [ 2110.225242] Modules linked in: > > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1 > > > [ 2110.225574] Hardware name: SiFive (DT) > > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e > > > [ 2110.225834] ra : riscv_pmu_start+0x28/0x8e > > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0 > > > [ 2110.226135] gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0 > > > [ 2110.226245] t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720 > > > [ 2110.226367] s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000 > > > [ 2110.226488] a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000 > > > [ 2110.226605] a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030 > > > [ 2110.226721] s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00 > > > [ 2110.226838] s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000 > > > [ 2110.226955] s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340 > > > [ 2110.227064] s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796 > > > [ 2110.227180] t5 : 0000000700000000 t6 : ffffaf8005269870 > > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e > > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174 > > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c > > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c > > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba > > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66 > > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0 > > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4 > > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe > > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42 > > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2 > > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36 > > > > > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling > > > > > > Signed-off-by: Eric Lin <eric.lin@sifive.com> > > > --- > > > kernel/events/core.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > index db016e418931..098c875abe88 100644 > > > --- a/kernel/events/core.c > > > +++ b/kernel/events/core.c > > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > > > > > if (hwc->interrupts == MAX_INTERRUPTS) { > > > hwc->interrupts = 0; > > > + event->pmu->stop(event, 0); > > But how could the event have been stopped with a call to pmu->stop() > during throttling? Yeah, Changelog fails to explain how we got to the faulty state -- and without that we can't judge if the proposed solution actually fixes the problem or not. > > > > > perf_log_throttle(event, 1); > > > event->pmu->start(event, 0); > > > } > > > -- > > > 2.17.1 > > >
On Wed, Jun 21, 2023 at 01:58:11PM +0200, Peter Zijlstra wrote: > On Tue, Jun 20, 2023 at 11:18:05PM -0700, Stephane Eranian wrote: > > On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote: > > > > > > CC: Stephane Eranian > > > > > > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote: > > > > > > > > Currently, during the perf sampling, if the perf interrupt takes too long, > > > > perf framework will lower the perf_event_max_sample_rate. This will limit > > > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts > > > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function. > > > > > > > > Afterward, the perf framework will unthrottle the event in the timer interrupt > > > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's > > > > *_pmu_start() functions will check the event->hw.state to determine whether this > > > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning > > > > will be triggered as shown below: > > > > > > > > [ 2110.224723] ------------[ cut here ]------------ > > > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e > > > > [ 2110.225242] Modules linked in: > > > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1 > > > > [ 2110.225574] Hardware name: SiFive (DT) > > > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e > > > > [ 2110.225834] ra : riscv_pmu_start+0x28/0x8e > > > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0 > > > > [ 2110.226135] gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0 > > > > [ 2110.226245] t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720 > > > > [ 2110.226367] s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000 > > > > [ 2110.226488] a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000 > > > > [ 2110.226605] a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030 > > > > [ 2110.226721] s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00 > > > > [ 2110.226838] s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000 > > > > [ 2110.226955] s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340 > > > > [ 2110.227064] s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796 > > > > [ 2110.227180] t5 : 0000000700000000 t6 : ffffaf8005269870 > > > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > > > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e > > > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174 > > > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c > > > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c > > > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba > > > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66 > > > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0 > > > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4 > > > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe > > > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42 > > > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2 > > > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36 > > > > > > > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling > > > > > > > > Signed-off-by: Eric Lin <eric.lin@sifive.com> > > > > --- > > > > kernel/events/core.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > index db016e418931..098c875abe88 100644 > > > > --- a/kernel/events/core.c > > > > +++ b/kernel/events/core.c > > > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > > > > > > > if (hwc->interrupts == MAX_INTERRUPTS) { > > > > hwc->interrupts = 0; > > > > + event->pmu->stop(event, 0); > > > > But how could the event have been stopped with a call to pmu->stop() > > during throttling? > > Yeah, Changelog fails to explain how we got to the faulty state -- and > without that we can't judge if the proposed solution actually fixes the > problem or not. > Hi Stephane, Peter, Most of the pmu driver will call *_pmu_stop(event,0) in the *_pmu_handle_irq() function and update the hwc->state with PERF_HES_STOPPED flag as below: arch/alpha/kernel/perf_event.c:856: if (perf_event_overflow(event, &data, regs)) { arch/alpha/kernel/perf_event.c-857- /* Interrupts coming too quickly; "throttle" the arch/alpha/kernel/perf_event.c-858- * counter, i.e., disable it for a little while. arch/alpha/kernel/perf_event.c-859- */ arch/alpha/kernel/perf_event.c-860- alpha_pmu_stop(event, 0); arch/alpha/kernel/perf_event.c-861- } ----- arch/arc/kernel/perf_event.c:603: if (perf_event_overflow(event, &data, regs)) arch/arc/kernel/perf_event.c-604- arc_pmu_stop(event, 0); arch/arc/kernel/perf_event.c-605- } ----- arch/x86/events/amd/core.c:935: if (perf_event_overflow(event, &data, regs)) arch/x86/events/amd/core.c-936- x86_pmu_stop(event, 0); arch/x86/events/amd/core.c-937- } ----- However, some of the pmu drivers stop the event in the *_pmu_handle_irq() without update the hwc->state with PERF_HES_STOPPED flag as below: arch/arm/kernel/perf_event_v7.c:994: if (perf_event_overflow(event, &data, regs)) arch/arm/kernel/perf_event_v7.c-995- cpu_pmu->disable(event); // <== not update with PERF_HES_STOPPED arch/arm/kernel/perf_event_v7.c-996- } ------ arch/csky/kernel/perf_event.c:1142: if (perf_event_overflow(event, &data, regs)) arch/csky/kernel/perf_event.c-1143- csky_pmu_stop_event(event); // <== not update with PERF_HES_STOPPED arch/csky/kernel/perf_event.c-1144- } ------- arch/loongarch/kernel/perf_event.c:492: if (perf_event_overflow(event, data, regs)) arch/loongarch/kernel/perf_event.c-493- loongarch_pmu_disable_event(idx); // <== not update with PERF_HES_STOPPED arch/loongarch/kernel/perf_event.c-494-} ------- arch/mips/kernel/perf_event_mipsxx.c:794: if (perf_event_overflow(event, data, regs)) arch/mips/kernel/perf_event_mipsxx.c-795- mipsxx_pmu_disable_event(idx); // <== not update with PERF_HES_STOPPED arch/mips/kernel/perf_event_mipsxx.c-796-} .... Furthermore, these drivers did not add event->hw.state checking in *_pmu_start() before starting the event like x86 does: 1497 static void x86_pmu_start(struct perf_event *event, int flags) 1498 { 1499 struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); 1500 int idx = event->hw.idx; 1501 1502 if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) 1503 return; 1504 As a result, these drivers won't trigger the WARN_ON_ONCE warning as shown in this patch. However, if a pmu driver like RISC-V pmu which didn't call *_pmu_stop(event,0) without update the hwc->state with PERF_HES_STOPPED flag in the *_pmu_handle_irq() function but has event->hw.state checking in *_pmu_start(), it could trigger the WARN_ON_ONCE warning as shown in this patch. Therefore, I think we need to call pmu->stop() before unthrottling the event to prevent this warning. Or maybe we can add hwc->state checking before we call pmu->stop() as below: --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4128,7 +4128,8 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) if (hwc->interrupts == MAX_INTERRUPTS) { hwc->interrupts = 0; - event->pmu->stop(event, 0); + if (!(hwc->state & PERF_HES_STOPPED)) + event->pmu->stop(event, 0); perf_log_throttle(event, 1); event->pmu->start(event, 0); } Thanks. Best Regards, Eric Lin. > > > > > > > perf_log_throttle(event, 1); > > > > event->pmu->start(event, 0); > > > > } > > > > -- > > > > 2.17.1 > > > >
On Wed, Jun 21, 2023 at 01:58:11PM +0200, Peter Zijlstra wrote: > On Tue, Jun 20, 2023 at 11:18:05PM -0700, Stephane Eranian wrote: > > On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote: > > > > > > CC: Stephane Eranian > > > > > > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote: > > > > > > > > Currently, during the perf sampling, if the perf interrupt takes too long, > > > > perf framework will lower the perf_event_max_sample_rate. This will limit > > > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts > > > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function. > > > > > > > > Afterward, the perf framework will unthrottle the event in the timer interrupt > > > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's > > > > *_pmu_start() functions will check the event->hw.state to determine whether this > > > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning > > > > will be triggered as shown below: > > > > > > > > [ 2110.224723] ------------[ cut here ]------------ > > > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e > > > > [ 2110.225242] Modules linked in: > > > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1 > > > > [ 2110.225574] Hardware name: SiFive (DT) > > > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e > > > > [ 2110.225834] ra : riscv_pmu_start+0x28/0x8e > > > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0 > > > > [ 2110.226135] gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0 > > > > [ 2110.226245] t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720 > > > > [ 2110.226367] s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000 > > > > [ 2110.226488] a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000 > > > > [ 2110.226605] a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030 > > > > [ 2110.226721] s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00 > > > > [ 2110.226838] s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000 > > > > [ 2110.226955] s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340 > > > > [ 2110.227064] s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796 > > > > [ 2110.227180] t5 : 0000000700000000 t6 : ffffaf8005269870 > > > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 > > > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e > > > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174 > > > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c > > > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c > > > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba > > > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66 > > > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0 > > > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4 > > > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe > > > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42 > > > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2 > > > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36 > > > > > > > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling > > > > > > > > Signed-off-by: Eric Lin <eric.lin@sifive.com> > > > > --- > > > > kernel/events/core.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > index db016e418931..098c875abe88 100644 > > > > --- a/kernel/events/core.c > > > > +++ b/kernel/events/core.c > > > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > > > > > > > if (hwc->interrupts == MAX_INTERRUPTS) { > > > > hwc->interrupts = 0; > > > > + event->pmu->stop(event, 0); > > > > But how could the event have been stopped with a call to pmu->stop() > > during throttling? > > Yeah, Changelog fails to explain how we got to the faulty state -- and > without that we can't judge if the proposed solution actually fixes the > problem or not. > Hi Stephane, Peter, Most of the pmu driver will call *_pmu_stop(event,0) in the *_pmu_handle_irq() function and update the hwc->state with PERF_HES_STOPPED flag as below: arch/alpha/kernel/perf_event.c:856: if (perf_event_overflow(event, &data, regs)) { arch/alpha/kernel/perf_event.c-857- /* Interrupts coming too quickly; "throttle" the arch/alpha/kernel/perf_event.c-858- * counter, i.e., disable it for a little while. arch/alpha/kernel/perf_event.c-859- */ arch/alpha/kernel/perf_event.c-860- alpha_pmu_stop(event, 0); arch/alpha/kernel/perf_event.c-861- } ----- arch/arc/kernel/perf_event.c:603: if (perf_event_overflow(event, &data, regs)) arch/arc/kernel/perf_event.c-604- arc_pmu_stop(event, 0); arch/arc/kernel/perf_event.c-605- } ----- arch/x86/events/amd/core.c:935: if (perf_event_overflow(event, &data, regs)) arch/x86/events/amd/core.c-936- x86_pmu_stop(event, 0); arch/x86/events/amd/core.c-937- } ----- However, some of the pmu drivers stop the event in the *_pmu_handle_irq() without updating the hwc->state with PERF_HES_STOPPED flag as below: arch/arm/kernel/perf_event_v7.c:994: if (perf_event_overflow(event, &data, regs)) arch/arm/kernel/perf_event_v7.c-995- cpu_pmu->disable(event); // <== not update with PERF_HES_STOPPED arch/arm/kernel/perf_event_v7.c-996- } ------ arch/csky/kernel/perf_event.c:1142: if (perf_event_overflow(event, &data, regs)) arch/csky/kernel/perf_event.c-1143- csky_pmu_stop_event(event); // <== not update with PERF_HES_STOPPED arch/csky/kernel/perf_event.c-1144- } ------- arch/loongarch/kernel/perf_event.c:492: if (perf_event_overflow(event, data, regs)) arch/loongarch/kernel/perf_event.c-493- loongarch_pmu_disable_event(idx); // <== not update with PERF_HES_STOPPED arch/loongarch/kernel/perf_event.c-494-} ------- arch/mips/kernel/perf_event_mipsxx.c:794: if (perf_event_overflow(event, data, regs)) arch/mips/kernel/perf_event_mipsxx.c-795- mipsxx_pmu_disable_event(idx); // <== not update with PERF_HES_STOPPED arch/mips/kernel/perf_event_mipsxx.c-796-} .... Furthermore, these drivers did not add event->hw.state checking in *_pmu_start() before starting the event like x86 does: 1497 static void x86_pmu_start(struct perf_event *event, int flags) 1498 { 1499 struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); 1500 int idx = event->hw.idx; 1501 1502 if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) 1503 return; 1504 As a result, these drivers won't trigger the WARN_ON_ONCE warning as shown in this patch. However, if a pmu driver like RISC-V pmu which didn't call *_pmu_stop(event,0) without updating the hwc->state with PERF_HES_STOPPED flag in the *_pmu_handle_irq() function but has event->hw.state checking in *_pmu_start(), it could trigger the WARN_ON_ONCE warning as shown in this patch. Therefore, I think we need to call pmu->stop() before unthrottling the event to prevent this warning. Or maybe we can add hwc->state checking before we call pmu->stop() as below: --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4128,7 +4128,8 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) if (hwc->interrupts == MAX_INTERRUPTS) { hwc->interrupts = 0; - event->pmu->stop(event, 0); + if (!(hwc->state & PERF_HES_STOPPED)) + event->pmu->stop(event, 0); perf_log_throttle(event, 1); event->pmu->start(event, 0); } Thanks. Best Regards, Eric Lin. > > > > > > > perf_log_throttle(event, 1); > > > > event->pmu->start(event, 0); > > > > } > > > > -- > > > > 2.17.1 > > > >
On Tue, Jun 27, 2023 at 05:08:07PM +0800, Eric Lin wrote: > > Yeah, Changelog fails to explain how we got to the faulty state -- and > > without that we can't judge if the proposed solution actually fixes the > > problem or not. > > > > Hi Stephane, Peter, > > Most of the pmu driver will call *_pmu_stop(event,0) in the > *_pmu_handle_irq() function and update the hwc->state with > PERF_HES_STOPPED flag as below: > > arch/alpha/kernel/perf_event.c:856: if > (perf_event_overflow(event, &data, regs)) { > arch/alpha/kernel/perf_event.c-857- /* Interrupts > coming too quickly; "throttle" the > arch/alpha/kernel/perf_event.c-858- * counter, > i.e., disable it for a little while. > arch/alpha/kernel/perf_event.c-859- */ > arch/alpha/kernel/perf_event.c-860- > alpha_pmu_stop(event, 0); > arch/alpha/kernel/perf_event.c-861- } > ----- > arch/arc/kernel/perf_event.c:603: if > (perf_event_overflow(event, &data, regs)) > arch/arc/kernel/perf_event.c-604- > arc_pmu_stop(event, 0); > arch/arc/kernel/perf_event.c-605- } > ----- > arch/x86/events/amd/core.c:935: if (perf_event_overflow(event, > &data, regs)) > arch/x86/events/amd/core.c-936- x86_pmu_stop(event, 0); > arch/x86/events/amd/core.c-937- } > ----- > > However, some of the pmu drivers stop the event in the > *_pmu_handle_irq() without updating the hwc->state with > PERF_HES_STOPPED flag as below: > > arch/arm/kernel/perf_event_v7.c:994: if > (perf_event_overflow(event, &data, regs)) > arch/arm/kernel/perf_event_v7.c-995- > cpu_pmu->disable(event); // <== not update with PERF_HES_STOPPED > arch/arm/kernel/perf_event_v7.c-996- } > ------ > arch/csky/kernel/perf_event.c:1142: if > (perf_event_overflow(event, &data, regs)) > arch/csky/kernel/perf_event.c-1143- > csky_pmu_stop_event(event); // <== not update with PERF_HES_STOPPED > arch/csky/kernel/perf_event.c-1144- } > ------- > arch/loongarch/kernel/perf_event.c:492: if (perf_event_overflow(event, > data, regs)) > arch/loongarch/kernel/perf_event.c-493- > loongarch_pmu_disable_event(idx); // <== not update with > PERF_HES_STOPPED > arch/loongarch/kernel/perf_event.c-494-} > ------- > arch/mips/kernel/perf_event_mipsxx.c:794: if > (perf_event_overflow(event, data, regs)) > arch/mips/kernel/perf_event_mipsxx.c-795- > mipsxx_pmu_disable_event(idx); // <== not update with PERF_HES_STOPPED > arch/mips/kernel/perf_event_mipsxx.c-796-} > .... > > Furthermore, these drivers did not add event->hw.state checking in > *_pmu_start() before starting the event like x86 does: > > 1497 static void x86_pmu_start(struct perf_event *event, int flags) > 1498 { > 1499 struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > 1500 int idx = event->hw.idx; > 1501 > 1502 if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) > 1503 return; > 1504 > > As a result, these drivers won't trigger the WARN_ON_ONCE warning as > shown in this patch. > > However, if a pmu driver like RISC-V pmu which didn't call > *_pmu_stop(event,0) without updating the hwc->state with > PERF_HES_STOPPED flag in the *_pmu_handle_irq() function > but has event->hw.state checking in *_pmu_start(), it could trigger > the WARN_ON_ONCE warning as shown in this patch. > > Therefore, I think we need to call pmu->stop() before unthrottling the > event to prevent this warning. How is that not a pmu driver problem ? I'd think we should be fixing those drivers. Mark, do you have have any memories of how the ARM driver came to be this way?
diff --git a/kernel/events/core.c b/kernel/events/core.c index db016e418931..098c875abe88 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) if (hwc->interrupts == MAX_INTERRUPTS) { hwc->interrupts = 0; + event->pmu->stop(event, 0); perf_log_throttle(event, 1); event->pmu->start(event, 0); }
Currently, during the perf sampling, if the perf interrupt takes too long, perf framework will lower the perf_event_max_sample_rate. This will limit the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts to MAX_INTERRUPTS within the __perf_event_account_interrupt() function. Afterward, the perf framework will unthrottle the event in the timer interrupt handler, which triggers the driver's *_pmu_start() function. Most of the driver's *_pmu_start() functions will check the event->hw.state to determine whether this event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning will be triggered as shown below: [ 2110.224723] ------------[ cut here ]------------ [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e [ 2110.225242] Modules linked in: [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1 [ 2110.225574] Hardware name: SiFive (DT) [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e [ 2110.225834] ra : riscv_pmu_start+0x28/0x8e [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0 [ 2110.226135] gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0 [ 2110.226245] t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720 [ 2110.226367] s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000 [ 2110.226488] a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000 [ 2110.226605] a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030 [ 2110.226721] s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00 [ 2110.226838] s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000 [ 2110.226955] s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340 [ 2110.227064] s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796 [ 2110.227180] t5 : 0000000700000000 t6 : ffffaf8005269870 [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174 [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66 [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0 [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4 [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42 [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2 [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36 To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling Signed-off-by: Eric Lin <eric.lin@sifive.com> --- kernel/events/core.c | 1 + 1 file changed, 1 insertion(+)