Message ID | 20240603092812.46616-1-yangyicong@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | Perf avoid opening events on offline CPUs | expand |
On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: > > From: Yicong Yang <yangyicong@hisilicon.com> > > If user doesn't specify the CPUs, perf will try to open events on CPUs > of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs > attributes if provided. But we doesn't check whether the CPUs provided > by the PMU are all online. So we may open events on offline CPUs if PMU > driver provide offline CPUs and then we'll be rejected by the kernel: > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online Generally Linux won't let you take CPU0 off line, I'm not able to follow this step on x86 Linux. Fwiw, I routinely run perf with the core hyperthread siblings offline. Thanks, Ian > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 > Error: > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). > /bin/dmesg | grep -i perf may provide additional information. > > This patchset tries to avoid this case by: > - Double check the PMU's cpumask in the perf tool and only include online CPUs > - Trying to make the PMU drivers only export online CPUs in its "cpus" or "cpumask" > attributes > > Previously discussion can be found at [1]. Will suggested to do it in userspace. > I think it makes sense to do a double check in the perf tool in case the driver > doesn't do this. So PATCH 1/3 is added in this version. > > [1] https://lore.kernel.org/linux-arm-kernel/20240410095833.63934-1-yangyicong@huawei.com/ > > Yicong Yang (3): > perf pmu: Limit PMU cpumask to online CPUs > perf: arm_pmu: Only show online CPUs in device's "cpus" attribute > perf: arm_spe: Only show online CPUs in device's "cpumask" attribute > > drivers/perf/arm_pmu.c | 24 +++++++++++++++++++++++- > drivers/perf/arm_spe_pmu.c | 22 +++++++++++++++++++++- > tools/perf/util/pmu.c | 13 +++++++++++-- > 3 files changed, 55 insertions(+), 4 deletions(-) > > -- > 2.24.0 >
On 2024/6/4 0:42, Ian Rogers wrote: > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: >> >> From: Yicong Yang <yangyicong@hisilicon.com> >> >> If user doesn't specify the CPUs, perf will try to open events on CPUs >> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs >> attributes if provided. But we doesn't check whether the CPUs provided >> by the PMU are all online. So we may open events on offline CPUs if PMU >> driver provide offline CPUs and then we'll be rejected by the kernel: >> >> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online > > Generally Linux won't let you take CPU0 off line, I'm not able to > follow this step on x86 Linux. Fwiw, I routinely run perf with the > core hyperthread siblings offline. > It doesn't matter if it's the CPU0 offline or other CPUs. There's no restriction for CPU0 can go offline or not on arm64 and I just use this for example. I cannot reproduce it on x86. I think it may because we're initializing the pmu->cpus in different routines in pmu_cpumask(). There's no "cpus" for x86's core pmu on my x86 machine: root@ubuntu204:~# ls /sys/bus/event_source/devices/cpu/ allow_tsx_force_abort caps events format freeze_on_smi perf_event_mux_interval_ms power rdpmc subsystem type uevent So pmu_cpumask() will infer it as an core pmu and initialize the cpus with online CPUs [1]. For arm64 there lies a "cpus" sysfs attributes so pmu->cpus are initialized from the "cpus" without checking each CPUs is online or not. That's what proposed in Patch 1/3. There's a "cpus" sysfs for x86's hybrid machine, reading from the code [2]. And it seems always reflect the online CPUs supported by that PMU. [1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n779 [2] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree//arch/x86/events/intel/core.c#n5736 Thanks.
On Tue, Jun 4, 2024 at 1:04 AM Yicong Yang <yangyicong@huawei.com> wrote: > > On 2024/6/4 0:42, Ian Rogers wrote: > > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: > >> > >> From: Yicong Yang <yangyicong@hisilicon.com> > >> > >> If user doesn't specify the CPUs, perf will try to open events on CPUs > >> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs > >> attributes if provided. But we doesn't check whether the CPUs provided > >> by the PMU are all online. So we may open events on offline CPUs if PMU > >> driver provide offline CPUs and then we'll be rejected by the kernel: > >> > >> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online > > > > Generally Linux won't let you take CPU0 off line, I'm not able to > > follow this step on x86 Linux. Fwiw, I routinely run perf with the > > core hyperthread siblings offline. > > > > It doesn't matter if it's the CPU0 offline or other CPUs. There's no restriction > for CPU0 can go offline or not on arm64 and I just use this for example. > > I cannot reproduce it on x86. I think it may because we're initializing the > pmu->cpus in different routines in pmu_cpumask(). There's no "cpus" > for x86's core pmu on my x86 machine: > root@ubuntu204:~# ls /sys/bus/event_source/devices/cpu/ > allow_tsx_force_abort caps events format freeze_on_smi perf_event_mux_interval_ms power rdpmc subsystem type uevent > > So pmu_cpumask() will infer it as an core pmu and initialize the cpus > with online CPUs [1]. For arm64 there lies a "cpus" sysfs attributes > so pmu->cpus are initialized from the "cpus" without checking each > CPUs is online or not. That's what proposed in Patch 1/3. > > There's a "cpus" sysfs for x86's hybrid machine, reading from the code [2]. > And it seems always reflect the online CPUs supported by that PMU. Thanks Yicong, looking on a hybrid machine and taking cpu1 offline I see the PMU's "cpus" not containing the offline CPU. I think this supports that the PMU driver should be fixed for ARM, but we'll also need a workaround in the perf tool for older kernels. Thanks, Ian > [1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n779 > [2] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree//arch/x86/events/intel/core.c#n5736 > > Thanks.
On Mon, Jun 03, 2024 at 05:28:09PM +0800, Yicong Yang wrote: > From: Yicong Yang <yangyicong@hisilicon.com> > > If user doesn't specify the CPUs, perf will try to open events on CPUs > of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs > attributes if provided. But we doesn't check whether the CPUs provided > by the PMU are all online. So we may open events on offline CPUs if PMU > driver provide offline CPUs and then we'll be rejected by the kernel: > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 > Error: > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). > /bin/dmesg | grep -i perf may provide additional information. I still don't see the value in this. CPUs can come and go asynchronously, so this is all horribly racy. Furthermore, there are other (racy) ways to find out which CPUs are online and whatever we do in the kernel now isn't going to help userspace running on older kernels. Will
On Mon, Jul 1, 2024 at 7:22 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Jun 03, 2024 at 05:28:09PM +0800, Yicong Yang wrote: > > From: Yicong Yang <yangyicong@hisilicon.com> > > > > If user doesn't specify the CPUs, perf will try to open events on CPUs > > of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs > > attributes if provided. But we doesn't check whether the CPUs provided > > by the PMU are all online. So we may open events on offline CPUs if PMU > > driver provide offline CPUs and then we'll be rejected by the kernel: > > > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 > > Error: > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). > > /bin/dmesg | grep -i perf may provide additional information. > > I still don't see the value in this. CPUs can come and go asynchronously, > so this is all horribly racy. Furthermore, there are other (racy) ways > to find out which CPUs are online and whatever we do in the kernel now > isn't going to help userspace running on older kernels. Hi Will, you are assuming here that a counter should be opened on all CPUs. This is true for "core" PMUs for events like "cycles" but isn't true for uncore PMUs. For an uncore PMU on dual socket x86 it may have a cpumask advertising opening on CPUs "0,18". If CPU 18 is taken offline then the cpumask becomes say "0,19". In the perf tool we don't need to determine the topology of the system and see that 19 is next to 18 and online, the PMU driver does it for us. If we intersect the cpumask of an uncore PMU with online CPUs and the cpumask were still "0,18", then we'd end up only opening the event on one socket (for CPU 0). What the cpumask is providing is the list of default CPUs we want to open the event upon, like a "-C" command line option. Yes this is racy if you are running perf and taking CPUs on and offline, but it's what happens on x86 and we live with it okay. That's not to say we can't do a smarter topology system, cover races with things like BPF, and so on. These things just aren't where the perf tool code is today and could face challenges wrt permissions, older kernel compatibility and so on. In the perf tool code today: - No cpumask/cpus file was provided on non-hybrid/BIG.little systems and so such a cpumask is taken to mean open on all online CPUs. If the cpumask exists but is empty then we also do this. I see broken ARM memory controller PMUs with empty cpumask files. - A cpus file with a list of online CPUs. This is used by core PMUs on hybrid/BIG.little systems. x86 doesn't place offline CPUs in this file but ARM does. My hope is that ARM can be consistent with x86. - A cpumask with a list of say one CPU per socket. This is generally used by uncore PMUs, CPUs not in the cpumask are still valid and are sometimes used to spread interrupt load. The majority of ARM PMU drivers seem broken here, either wrt offline CPUs, not providing a cpumask, etc. Thanks, Ian
From: Yicong Yang <yangyicong@hisilicon.com> If user doesn't specify the CPUs, perf will try to open events on CPUs of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs attributes if provided. But we doesn't check whether the CPUs provided by the PMU are all online. So we may open events on offline CPUs if PMU driver provide offline CPUs and then we'll be rejected by the kernel: [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 Error: The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). /bin/dmesg | grep -i perf may provide additional information. This patchset tries to avoid this case by: - Double check the PMU's cpumask in the perf tool and only include online CPUs - Trying to make the PMU drivers only export online CPUs in its "cpus" or "cpumask" attributes Previously discussion can be found at [1]. Will suggested to do it in userspace. I think it makes sense to do a double check in the perf tool in case the driver doesn't do this. So PATCH 1/3 is added in this version. [1] https://lore.kernel.org/linux-arm-kernel/20240410095833.63934-1-yangyicong@huawei.com/ Yicong Yang (3): perf pmu: Limit PMU cpumask to online CPUs perf: arm_pmu: Only show online CPUs in device's "cpus" attribute perf: arm_spe: Only show online CPUs in device's "cpumask" attribute drivers/perf/arm_pmu.c | 24 +++++++++++++++++++++++- drivers/perf/arm_spe_pmu.c | 22 +++++++++++++++++++++- tools/perf/util/pmu.c | 13 +++++++++++-- 3 files changed, 55 insertions(+), 4 deletions(-)