Message ID | cover.1650320598.git.robin.murphy@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | perf/arm-cmn: Add CMN-650 and CMN-700 | expand |
On Mon, 18 Apr 2022, Robin Murphy wrote: > Hi Will, > > As promised last time, here are the patches adding yet more inscrutable > support for the other new members of the CMN family. I'm not sure if any > of the folks who want this are the kind to give upstream feedback, but > we can always live in hope... A colleague has kindly managed to briefly > sanity-check the CMN-700 support under RTL emulation, but otherwise I've > only been able to test the refactorings in terms of not breaking my > little CMN-600 platform. Still, I'm pretty confident that that's caught > all the stupids, and the rest is mostly just yet more events, so I > don't think anything's particularly high-risk here. > > Cheers, > Robin. I tested the patch set on CMN-650 (and CMN-600). Apart from extra events on r2p0, everything seemed to work fine. Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> > > > Robin Murphy (4): > dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700 > perf/arm-cmn: Add CMN-650 support > perf/arm-cmn: Refactor occupancy filter selector > perf/arm-cmn: Add CMN-700 support > > .../devicetree/bindings/perf/arm,cmn.yaml | 2 + > drivers/perf/arm-cmn.c | 606 ++++++++++++++---- > 2 files changed, 485 insertions(+), 123 deletions(-) > > -- > 2.35.3.dirty > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On Mon, 18 Apr 2022 23:57:37 +0100, Robin Murphy wrote: > As promised last time, here are the patches adding yet more inscrutable > support for the other new members of the CMN family. I'm not sure if any > of the folks who want this are the kind to give upstream feedback, but > we can always live in hope... A colleague has kindly managed to briefly > sanity-check the CMN-700 support under RTL emulation, but otherwise I've > only been able to test the refactorings in terms of not breaking my > little CMN-600 platform. Still, I'm pretty confident that that's caught > all the stupids, and the rest is mostly just yet more events, so I > don't think anything's particularly high-risk here. > > [...] Applied to will (for-next/perf), thanks! [1/4] dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700 https://git.kernel.org/will/c/2b60a22b70fa [2/4] perf/arm-cmn: Add CMN-650 support https://git.kernel.org/will/c/8e504d93acb6 [3/4] perf/arm-cmn: Refactor occupancy filter selector https://git.kernel.org/will/c/65adf71398f5 [4/4] perf/arm-cmn: Add CMN-700 support https://git.kernel.org/will/c/23760a014417 Cheers,
On Mon, Apr 18, 2022 at 11:57:37PM +0100, Robin Murphy wrote: > Hi Will, > > As promised last time, here are the patches adding yet more inscrutable > support for the other new members of the CMN family. I'm not sure if any > of the folks who want this are the kind to give upstream feedback, but > we can always live in hope... A colleague has kindly managed to briefly > sanity-check the CMN-700 support under RTL emulation, but otherwise I've > only been able to test the refactorings in terms of not breaking my > little CMN-600 platform. Still, I'm pretty confident that that's caught > all the stupids, and the rest is mostly just yet more events, so I > don't think anything's particularly high-risk here. > > Cheers, > Robin. > > > Robin Murphy (4): > dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700 > perf/arm-cmn: Add CMN-650 support > perf/arm-cmn: Refactor occupancy filter selector > perf/arm-cmn: Add CMN-700 support > > .../devicetree/bindings/perf/arm,cmn.yaml | 2 + > drivers/perf/arm-cmn.c | 606 ++++++++++++++---- > 2 files changed, 485 insertions(+), 123 deletions(-) Reverting this series fixed a null-ptr-deref while running some syscall fuzzers by an unprivileged user on a Neoverse-N1 server with two CMN-600 (one for each socket). KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f] Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [dfff800000000007] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 7 PID: 115283 Comm: trinity-c57 Not tainted 5.18.0-rc6-next-20220510-dirty #93 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : arm_cmn_event_init [arm_cmn] lr : perf_try_init_event sp : ffff800058eb7a00 x29: ffff800058eb7a10 x28: ffff080232bed880 x27: ffff400ccef2e120 x26: ffff400ccef2e280 x25: 000000000000001f x24: 0000000000000003 x23: 0000000000000001 x22: dfff800000000000 x21: 0000000000000219 x20: ffffb0f539491e88 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaae0821540 x14: 0000000000000028 x13: 0000000000000000 x12: ffff70000b1d6f03 x11: 1ffff0000b1d6f02 x10: ffff70000b1d6f02 x9 : ffffb0f572260468 x8 : ffff800058eb7817 x7 : ffff400ccef2e200 x6 : 1fffe80199de5c40 x5 : ffffb0f53948fa00 x4 : 0000000000000000 x3 : ffff080232bed8e0 x2 : ffffb0f539491efc x1 : 0000000000000007 x0 : 0000000000000038 Call trace: arm_cmn_event_init [arm_cmn] perf_try_init_event perf_init_event perf_event_alloc __do_sys_perf_event_open __arm64_sys_perf_event_open invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync Code: 35002a21 f9400293 9100e260 d343fc01 (38f66821) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs Kernel Offset: 0x30f569c50000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x000,0042e015,19801c82 Memory Limit: none 0x3e6c is in arm_cmn_event_init (drivers/perf/arm-cmn.c:1528). 1523 } 1524 1525 1526 static int arm_cmn_event_init(struct perf_event *event) 1527 { 1528 struct arm_cmn *cmn = to_cmn(event->pmu); 1529 struct arm_cmn_hw_event *hw = to_cmn_hw(event); 1530 struct arm_cmn_node *dn; 1531 enum cmn_node_type type; 1532 bool bynodeid; # "map_1" has the same content. $ sudo cat /sys/kernel/debug/arm-cmn/map X 0 1 2 3 4 5 6 7 Y P D+--------+--------+--------+--------+--------+--------+--------+--------+ 5 | XP #40 | XP #41 | XP #42 | XP #43 | XP #44 | XP #45 | XP #46 | XP #47 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | CXRH | HN-I | HN-I | HN-I | HN-D | HN-I | HN-I | CXRH | 0| #2 | #3 | #4 | #5 | #0 | #6 | #7 | #3 | 1| | | | | | | | | 1 | RN-D | RN-D | RN-D | RN-I | RN-I | RN-D | RN-D | RN-D | 0| #2 | #3 | #4 | #10 | #11 | #5 | #6 | #7 | 1| | | | | | | | | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 4 | XP #32 | XP #33 | XP #34 | XP #35 | XP #36 | XP #37 | XP #38 | XP #39 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | 0| | | #8 | | | #9 | | | 1| | | | | | | | | 1 | | RN-F_B | | RN-F_B | RN-F_B | | RN-F_B | | 0| #24 | | #26 | | | #28 | | #30 | 1| #25 | | #27 | | | #29 | | #31 | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 3 | XP #24 | XP #25 | XP #26 | XP #27 | XP #28 | XP #29 | XP #30 | XP #31 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | 0| | | #6 | | | #7 | | | 1| | | | | | | | | 1 | | RN-F_B | | RN-F_B | RN-F_B | | RN-F_B | | 0| #16 | | #18 | | | #20 | | #22 | 1| #17 | | #19 | | | #21 | | #23 | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 2 | XP #16 | XP #17 | XP #18 | XP #19 | XP #20 | XP #21 | XP #22 | XP #23 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | 0| | | #4 | | | #5 | | | 1| | | | | | | | | 1 | | RN-F_B | | RN-F_B | RN-F_B | | RN-F_B | | 0| #8 | | #10 | | | #12 | | #14 | 1| #9 | | #11 | | | #13 | | #15 | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 1 | XP #8 | XP #9 | XP #10 | XP #11 | XP #12 | XP #13 | XP #14 | XP #15 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | 0| | | #2 | | | #3 | | | 1| | | | | | | | | 1 | | RN-F_B | | RN-F_B | RN-F_B | | RN-F_B | | 0| #0 | | #2 | | | #4 | | #6 | 1| #1 | | #3 | | | #5 | | #7 | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 0 | XP #0 | XP #1 | XP #2 | XP #3 | XP #4 | XP #5 | XP #6 | XP #7 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | CXRH | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | CXRH | 0| #0 | | #0 | | | #1 | | #1 | 1| | | | | | | | | 1 | RN-D | RN-F_B | HN-I | RN-F_B | RN-F_B | HN-I | RN-F_B | RN-D | 0| #0 | | #1 | | | #2 | | #1 | 1| | | | | | | | | -----+--------+--------+--------+--------+--------+--------+--------+--------+
On 2022-05-10 18:15, Qian Cai wrote: > On Mon, Apr 18, 2022 at 11:57:37PM +0100, Robin Murphy wrote: >> Hi Will, >> >> As promised last time, here are the patches adding yet more inscrutable >> support for the other new members of the CMN family. I'm not sure if any >> of the folks who want this are the kind to give upstream feedback, but >> we can always live in hope... A colleague has kindly managed to briefly >> sanity-check the CMN-700 support under RTL emulation, but otherwise I've >> only been able to test the refactorings in terms of not breaking my >> little CMN-600 platform. Still, I'm pretty confident that that's caught >> all the stupids, and the rest is mostly just yet more events, so I >> don't think anything's particularly high-risk here. >> >> Cheers, >> Robin. >> >> >> Robin Murphy (4): >> dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700 >> perf/arm-cmn: Add CMN-650 support >> perf/arm-cmn: Refactor occupancy filter selector >> perf/arm-cmn: Add CMN-700 support >> >> .../devicetree/bindings/perf/arm,cmn.yaml | 2 + >> drivers/perf/arm-cmn.c | 606 ++++++++++++++---- >> 2 files changed, 485 insertions(+), 123 deletions(-) > > Reverting this series fixed a null-ptr-deref while running some syscall > fuzzers by an unprivileged user on a Neoverse-N1 server with two CMN-600 > (one for each socket). > > KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f] > Mem abort info: > ESR = 0x0000000096000004 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x04: level 0 translation fault > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > [dfff800000000007] address between user and kernel address ranges > Internal error: Oops: 96000004 [#1] PREEMPT SMP > CPU: 7 PID: 115283 Comm: trinity-c57 Not tainted 5.18.0-rc6-next-20220510-dirty #93 > pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : arm_cmn_event_init [arm_cmn] > lr : perf_try_init_event > sp : ffff800058eb7a00 > x29: ffff800058eb7a10 x28: ffff080232bed880 x27: ffff400ccef2e120 > x26: ffff400ccef2e280 x25: 000000000000001f x24: 0000000000000003 > x23: 0000000000000001 x22: dfff800000000000 x21: 0000000000000219 > x20: ffffb0f539491e88 x19: 0000000000000000 x18: 0000000000000000 > x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaae0821540 > x14: 0000000000000028 x13: 0000000000000000 x12: ffff70000b1d6f03 > x11: 1ffff0000b1d6f02 x10: ffff70000b1d6f02 x9 : ffffb0f572260468 > x8 : ffff800058eb7817 x7 : ffff400ccef2e200 x6 : 1fffe80199de5c40 > x5 : ffffb0f53948fa00 x4 : 0000000000000000 x3 : ffff080232bed8e0 > x2 : ffffb0f539491efc x1 : 0000000000000007 x0 : 0000000000000038 > Call trace: > arm_cmn_event_init [arm_cmn] > perf_try_init_event > perf_init_event > perf_event_alloc > __do_sys_perf_event_open > __arm64_sys_perf_event_open > invoke_syscall > el0_svc_common.constprop.0 > do_el0_svc > el0_svc > el0t_64_sync_handler > el0t_64_sync > Code: 35002a21 f9400293 9100e260 d343fc01 (38f66821) > ---[ end trace 0000000000000000 ]--- > Kernel panic - not syncing: Oops: Fatal exception > SMP: stopping secondary CPUs > Kernel Offset: 0x30f569c50000 from 0xffff800008000000 > PHYS_OFFSET: 0x80000000 > CPU features: 0x000,0042e015,19801c82 > Memory Limit: none > > > 0x3e6c is in arm_cmn_event_init (drivers/perf/arm-cmn.c:1528). > 1523 } > 1524 > 1525 > 1526 static int arm_cmn_event_init(struct perf_event *event) > 1527 { > 1528 struct arm_cmn *cmn = to_cmn(event->pmu); > 1529 struct arm_cmn_hw_event *hw = to_cmn_hw(event); > 1530 struct arm_cmn_node *dn; > 1531 enum cmn_node_type type; > 1532 bool bynodeid; Hmm, can you narrow it down a bit more to a particular patch and/or a specific reproducer? I don't see how it could really be crashing dereferencing event->pmu, especially since none of the code this early in event_init has even changed recently :/ AFAICS the most plausible thing at offset 0x38 from anywhere would be event->pmu->type, but again, if event->pmu is NULL here that's surely a fault in perf core rather than the driver? > # "map_1" has the same content. > $ sudo cat /sys/kernel/debug/arm-cmn/map > X 0 1 2 3 4 5 6 7 > Y P D+--------+--------+--------+--------+--------+--------+--------+--------+ > 5 | XP #40 | XP #41 | XP #42 | XP #43 | XP #44 | XP #45 | XP #46 | XP #47 | > | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | > |........|........|........|........|........|........|........|........| > 0 | CXRH | HN-I | HN-I | HN-I | HN-D | HN-I | HN-I | CXRH | > 0| #2 | #3 | #4 | #5 | #0 | #6 | #7 | #3 | > 1| | | | | | | | | > 1 | RN-D | RN-D | RN-D | RN-I | RN-I | RN-D | RN-D | RN-D | > 0| #2 | #3 | #4 | #10 | #11 | #5 | #6 | #7 | > 1| | | | | | | | | > -----+--------+--------+--------+--------+--------+--------+--------+--------+ > 4 | XP #32 | XP #33 | XP #34 | XP #35 | XP #36 | XP #37 | XP #38 | XP #39 | > | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | > |........|........|........|........|........|........|........|........| > 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | > 0| | | #8 | | | #9 | | | > 1| | | | | | | | | > 1 | | RN-F_B | | RN-F_B | RN-F_B | | RN-F_B | | > 0| #24 | | #26 | | | #28 | | #30 | > 1| #25 | | #27 | | | #29 | | #31 | Something's definitely screwy here though... these ID numbers should have a device type above them. Was this dumped with patch #1 applied, and if not do they show as "????" when it is? Thanks, Robin. > -----+--------+--------+--------+--------+--------+--------+--------+--------+ > 3 | XP #24 | XP #25 | XP #26 | XP #27 | XP #28 | XP #29 | XP #30 | XP #31 | > | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | > |........|........|........|........|........|........|........|........| > 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | > 0| | | #6 | | | #7 | | | > 1| | | | | | | | | > 1 | | RN-F_B | | RN-F_B | RN-F_B | | RN-F_B | | > 0| #16 | | #18 | | | #20 | | #22 | > 1| #17 | | #19 | | | #21 | | #23 | > -----+--------+--------+--------+--------+--------+--------+--------+--------+ > 2 | XP #16 | XP #17 | XP #18 | XP #19 | XP #20 | XP #21 | XP #22 | XP #23 | > | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | > |........|........|........|........|........|........|........|........| > 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | > 0| | | #4 | | | #5 | | | > 1| | | | | | | | | > 1 | | RN-F_B | | RN-F_B | RN-F_B | | RN-F_B | | > 0| #8 | | #10 | | | #12 | | #14 | > 1| #9 | | #11 | | | #13 | | #15 | > -----+--------+--------+--------+--------+--------+--------+--------+--------+ > 1 | XP #8 | XP #9 | XP #10 | XP #11 | XP #12 | XP #13 | XP #14 | XP #15 | > | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | > |........|........|........|........|........|........|........|........| > 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | > 0| | | #2 | | | #3 | | | > 1| | | | | | | | | > 1 | | RN-F_B | | RN-F_B | RN-F_B | | RN-F_B | | > 0| #0 | | #2 | | | #4 | | #6 | > 1| #1 | | #3 | | | #5 | | #7 | > -----+--------+--------+--------+--------+--------+--------+--------+--------+ > 0 | XP #0 | XP #1 | XP #2 | XP #3 | XP #4 | XP #5 | XP #6 | XP #7 | > | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | > |........|........|........|........|........|........|........|........| > 0 | CXRH | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | CXRH | > 0| #0 | | #0 | | | #1 | | #1 | > 1| | | | | | | | | > 1 | RN-D | RN-F_B | HN-I | RN-F_B | RN-F_B | HN-I | RN-F_B | RN-D | > 0| #0 | | #1 | | | #2 | | #1 | > 1| | | | | | | | | > -----+--------+--------+--------+--------+--------+--------+--------+--------+
On Tue, May 10, 2022 at 06:50:56PM +0100, Robin Murphy wrote: > Hmm, can you narrow it down a bit more to a particular patch and/or a > specific reproducer? I don't see how it could really be crashing > dereferencing event->pmu, especially since none of the code this early in > event_init has even changed recently :/ Right, there is an old bug that sometimes the debuginfo on kernel modules is not accurate on arm64, so gdb and faddr2line gave the wrong line. Anyway, using the printk() method, we were faulted on this line. + /* This is sufficiently annoying to recalculate, so cache it */ + hw->filter_sel = arm_cmn_filter_sel(cmn->model, type, eventid); Also, it can be reproduced within 5-min by running, $ trinity -C 128 -cperf_event_open > Something's definitely screwy here though... these ID numbers should have a > device type above them. Was this dumped with patch #1 applied, and if not do > they show as "????" when it is? Yes, with the series applied, we have, X 0 1 2 3 4 5 6 7 Y P D+--------+--------+--------+--------+--------+--------+--------+--------+ 5 | XP #40 | XP #41 | XP #42 | XP #43 | XP #44 | XP #45 | XP #46 | XP #47 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | CXRH | HN-I | HN-I | HN-I | HN-D | HN-I | HN-I | CXRH | 0| #2 | #3 | #4 | #5 | #0 | #6 | #7 | #3 | 1| | | | | | | | | 1 | RN-D | RN-D | RN-D | RN-I | RN-I | RN-D | RN-D | RN-D | 0| #2 | #3 | #4 | #10 | #11 | #5 | #6 | #7 | 1| | | | | | | | | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 4 | XP #32 | XP #33 | XP #34 | XP #35 | XP #36 | XP #37 | XP #38 | XP #39 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | 0| | | #8 | | | #9 | | | 1| | | | | | | | | 1 | ???? | RN-F_B | ???? | RN-F_B | RN-F_B | ???? | RN-F_B | ???? | 0| #24 | | #26 | | | #28 | | #30 | 1| #25 | | #27 | | | #29 | | #31 | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 3 | XP #24 | XP #25 | XP #26 | XP #27 | XP #28 | XP #29 | XP #30 | XP #31 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | 0| | | #6 | | | #7 | | | 1| | | | | | | | | 1 | ???? | RN-F_B | ???? | RN-F_B | RN-F_B | ???? | RN-F_B | ???? | 0| #16 | | #18 | | | #20 | | #22 | 1| #17 | | #19 | | | #21 | | #23 | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 2 | XP #16 | XP #17 | XP #18 | XP #19 | XP #20 | XP #21 | XP #22 | XP #23 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | 0| | | #4 | | | #5 | | | 1| | | | | | | | | 1 | ???? | RN-F_B | ???? | RN-F_B | RN-F_B | ???? | RN-F_B | ???? | 0| #8 | | #10 | | | #12 | | #14 | 1| #9 | | #11 | | | #13 | | #15 | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 1 | XP #8 | XP #9 | XP #10 | XP #11 | XP #12 | XP #13 | XP #14 | XP #15 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | SN-F | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | SN-F | 0| | | #2 | | | #3 | | | 1| | | | | | | | | 1 | ???? | RN-F_B | ???? | RN-F_B | RN-F_B | ???? | RN-F_B | ???? | 0| #0 | | #2 | | | #4 | | #6 | 1| #1 | | #3 | | | #5 | | #7 | -----+--------+--------+--------+--------+--------+--------+--------+--------+ 0 | XP #0 | XP #1 | XP #2 | XP #3 | XP #4 | XP #5 | XP #6 | XP #7 | | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | DTC 0 | |........|........|........|........|........|........|........|........| 0 | CXRH | RN-F_B | RN-I | RN-F_B | RN-F_B | RN-I | RN-F_B | CXRH | 0| #0 | | #0 | | | #1 | | #1 | 1| | | | | | | | | 1 | RN-D | RN-F_B | HN-I | RN-F_B | RN-F_B | HN-I | RN-F_B | RN-D | 0| #0 | | #1 | | | #2 | | #1 | 1| | | | | | | | | -----+--------+--------+--------+--------+--------+--------+--------+--------+
On Tue, May 10, 2022 at 06:50:56PM +0100, Robin Murphy wrote: > Hmm, can you narrow it down a bit more to a particular patch and/or a > specific reproducer? I don't see how it could really be crashing > dereferencing event->pmu, especially since none of the code this early in > event_init has even changed recently :/ Specifically, One of "e" == NULL in arm_cmn_filter_sel(). e = container_of(arm_cmn_event_attrs[i], typeof(*e), attr.attr); Then, the next line will dereference it.
On 2022-05-10 20:38, Qian Cai wrote: > On Tue, May 10, 2022 at 06:50:56PM +0100, Robin Murphy wrote: >> Hmm, can you narrow it down a bit more to a particular patch and/or a >> specific reproducer? I don't see how it could really be crashing >> dereferencing event->pmu, especially since none of the code this early in >> event_init has even changed recently :/ > > Right, there is an old bug that sometimes the debuginfo on kernel > modules is not accurate on arm64, so gdb and faddr2line gave the wrong > line. Anyway, using the printk() method, we were faulted on this line. > > + /* This is sufficiently annoying to recalculate, so cache it */ > + hw->filter_sel = arm_cmn_filter_sel(cmn->model, type, eventid); Aha, that's a much more believable culprit - 0x38 also stood out as an offset in the attribute structures. > Also, it can be reproduced within 5-min by running, > > $ trinity -C 128 -cperf_event_open I can already guess what I've done, and just reproduced the crash directly on the first try :) I'll work on the fix tomorrow, thanks for the report and info! >> Something's definitely screwy here though... these ID numbers should have a >> device type above them. Was this dumped with patch #1 applied, and if not do >> they show as "????" when it is? > > Yes, with the series applied, we have, Ah, turns out I can no longer get away with being lazy and not masking the device_type field properly when reading connect_info, as some of the adjacent bits are no longer reserved since the debugfs code was first written - I've not played with a CAL config of any of the newer stuff, or I might have noticed sooner. But at least that tweak to differentiate unknown from unconnected has now paid for itself :D Cheers, Robin.