mbox series

[0/3] Perf avoid opening events on offline CPUs

Message ID 20240603092812.46616-1-yangyicong@huawei.com (mailing list archive)
Headers show
Series Perf avoid opening events on offline CPUs | expand

Message

Yicong Yang June 3, 2024, 9:28 a.m. UTC
From: Yicong Yang <yangyicong@hisilicon.com>

If user doesn't specify the CPUs, perf will try to open events on CPUs
of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
attributes if provided. But we doesn't check whether the CPUs provided
by the PMU are all online. So we may open events on offline CPUs if PMU
driver provide offline CPUs and then we'll be rejected by the kernel:

[root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
[root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
/bin/dmesg | grep -i perf may provide additional information.

This patchset tries to avoid this case by:
- Double check the PMU's cpumask in the perf tool and only include online CPUs
- Trying to make the PMU drivers only export online CPUs in its "cpus" or "cpumask"
  attributes

Previously discussion can be found at [1]. Will suggested to do it in userspace.
I think it makes sense to do a double check in the perf tool in case the driver
doesn't do this. So PATCH 1/3 is added in this version.

[1] https://lore.kernel.org/linux-arm-kernel/20240410095833.63934-1-yangyicong@huawei.com/

Yicong Yang (3):
  perf pmu: Limit PMU cpumask to online CPUs
  perf: arm_pmu: Only show online CPUs in device's "cpus" attribute
  perf: arm_spe: Only show online CPUs in device's "cpumask" attribute

 drivers/perf/arm_pmu.c     | 24 +++++++++++++++++++++++-
 drivers/perf/arm_spe_pmu.c | 22 +++++++++++++++++++++-
 tools/perf/util/pmu.c      | 13 +++++++++++--
 3 files changed, 55 insertions(+), 4 deletions(-)

Comments

Ian Rogers June 3, 2024, 4:42 p.m. UTC | #1
On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
>
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> If user doesn't specify the CPUs, perf will try to open events on CPUs
> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
> attributes if provided. But we doesn't check whether the CPUs provided
> by the PMU are all online. So we may open events on offline CPUs if PMU
> driver provide offline CPUs and then we'll be rejected by the kernel:
>
> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online

Generally Linux won't let you take CPU0 off line, I'm not able to
follow this step on x86 Linux. Fwiw, I routinely run perf with the
core hyperthread siblings offline.

Thanks,
Ian

> [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> /bin/dmesg | grep -i perf may provide additional information.
>
> This patchset tries to avoid this case by:
> - Double check the PMU's cpumask in the perf tool and only include online CPUs
> - Trying to make the PMU drivers only export online CPUs in its "cpus" or "cpumask"
>   attributes
>
> Previously discussion can be found at [1]. Will suggested to do it in userspace.
> I think it makes sense to do a double check in the perf tool in case the driver
> doesn't do this. So PATCH 1/3 is added in this version.
>
> [1] https://lore.kernel.org/linux-arm-kernel/20240410095833.63934-1-yangyicong@huawei.com/
>
> Yicong Yang (3):
>   perf pmu: Limit PMU cpumask to online CPUs
>   perf: arm_pmu: Only show online CPUs in device's "cpus" attribute
>   perf: arm_spe: Only show online CPUs in device's "cpumask" attribute
>
>  drivers/perf/arm_pmu.c     | 24 +++++++++++++++++++++++-
>  drivers/perf/arm_spe_pmu.c | 22 +++++++++++++++++++++-
>  tools/perf/util/pmu.c      | 13 +++++++++++--
>  3 files changed, 55 insertions(+), 4 deletions(-)
>
> --
> 2.24.0
>
Yicong Yang June 4, 2024, 8:03 a.m. UTC | #2
On 2024/6/4 0:42, Ian Rogers wrote:
> On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
>>
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> If user doesn't specify the CPUs, perf will try to open events on CPUs
>> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
>> attributes if provided. But we doesn't check whether the CPUs provided
>> by the PMU are all online. So we may open events on offline CPUs if PMU
>> driver provide offline CPUs and then we'll be rejected by the kernel:
>>
>> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> 
> Generally Linux won't let you take CPU0 off line, I'm not able to
> follow this step on x86 Linux. Fwiw, I routinely run perf with the
> core hyperthread siblings offline.
> 

It doesn't matter if it's the CPU0 offline or other CPUs. There's no restriction
for CPU0 can go offline or not on arm64 and I just use this for example.

I cannot reproduce it on x86. I think it may because we're initializing the
pmu->cpus in different routines in pmu_cpumask(). There's no "cpus"
for x86's core pmu on my x86 machine:
root@ubuntu204:~# ls /sys/bus/event_source/devices/cpu/
allow_tsx_force_abort  caps  events  format  freeze_on_smi  perf_event_mux_interval_ms  power  rdpmc  subsystem  type  uevent

So pmu_cpumask() will infer it as an core pmu and initialize the cpus
with online CPUs [1]. For arm64 there lies a "cpus" sysfs attributes
so pmu->cpus are initialized from the "cpus" without checking each
CPUs is online or not. That's what proposed in Patch 1/3.

There's a "cpus" sysfs for x86's hybrid machine, reading from the code [2].
And it seems always reflect the online CPUs supported by that PMU.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n779
[2] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree//arch/x86/events/intel/core.c#n5736

Thanks.
Ian Rogers June 6, 2024, 7:04 a.m. UTC | #3
On Tue, Jun 4, 2024 at 1:04 AM Yicong Yang <yangyicong@huawei.com> wrote:
>
> On 2024/6/4 0:42, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
> >>
> >> From: Yicong Yang <yangyicong@hisilicon.com>
> >>
> >> If user doesn't specify the CPUs, perf will try to open events on CPUs
> >> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
> >> attributes if provided. But we doesn't check whether the CPUs provided
> >> by the PMU are all online. So we may open events on offline CPUs if PMU
> >> driver provide offline CPUs and then we'll be rejected by the kernel:
> >>
> >> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> >
> > Generally Linux won't let you take CPU0 off line, I'm not able to
> > follow this step on x86 Linux. Fwiw, I routinely run perf with the
> > core hyperthread siblings offline.
> >
>
> It doesn't matter if it's the CPU0 offline or other CPUs. There's no restriction
> for CPU0 can go offline or not on arm64 and I just use this for example.
>
> I cannot reproduce it on x86. I think it may because we're initializing the
> pmu->cpus in different routines in pmu_cpumask(). There's no "cpus"
> for x86's core pmu on my x86 machine:
> root@ubuntu204:~# ls /sys/bus/event_source/devices/cpu/
> allow_tsx_force_abort  caps  events  format  freeze_on_smi  perf_event_mux_interval_ms  power  rdpmc  subsystem  type  uevent
>
> So pmu_cpumask() will infer it as an core pmu and initialize the cpus
> with online CPUs [1]. For arm64 there lies a "cpus" sysfs attributes
> so pmu->cpus are initialized from the "cpus" without checking each
> CPUs is online or not. That's what proposed in Patch 1/3.
>
> There's a "cpus" sysfs for x86's hybrid machine, reading from the code [2].
> And it seems always reflect the online CPUs supported by that PMU.

Thanks Yicong, looking on a hybrid machine and taking cpu1 offline I
see the PMU's "cpus" not containing the offline CPU. I think this
supports that the PMU driver should be fixed for ARM, but we'll also
need a workaround in the perf tool for older kernels.

Thanks,
Ian

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n779
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree//arch/x86/events/intel/core.c#n5736
>
> Thanks.
Will Deacon July 1, 2024, 2:22 p.m. UTC | #4
On Mon, Jun 03, 2024 at 05:28:09PM +0800, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> If user doesn't specify the CPUs, perf will try to open events on CPUs
> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
> attributes if provided. But we doesn't check whether the CPUs provided
> by the PMU are all online. So we may open events on offline CPUs if PMU
> driver provide offline CPUs and then we'll be rejected by the kernel:
> 
> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> /bin/dmesg | grep -i perf may provide additional information.

I still don't see the value in this. CPUs can come and go asynchronously,
so this is all horribly racy. Furthermore, there are other (racy) ways
to find out which CPUs are online and whatever we do in the kernel now
isn't going to help userspace running on older kernels.

Will
Ian Rogers July 3, 2024, 4:55 a.m. UTC | #5
On Mon, Jul 1, 2024 at 7:22 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jun 03, 2024 at 05:28:09PM +0800, Yicong Yang wrote:
> > From: Yicong Yang <yangyicong@hisilicon.com>
> >
> > If user doesn't specify the CPUs, perf will try to open events on CPUs
> > of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
> > attributes if provided. But we doesn't check whether the CPUs provided
> > by the PMU are all online. So we may open events on offline CPUs if PMU
> > driver provide offline CPUs and then we'll be rejected by the kernel:
> >
> > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > Error:
> > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > /bin/dmesg | grep -i perf may provide additional information.
>
> I still don't see the value in this. CPUs can come and go asynchronously,
> so this is all horribly racy. Furthermore, there are other (racy) ways
> to find out which CPUs are online and whatever we do in the kernel now
> isn't going to help userspace running on older kernels.

Hi Will,

you are assuming here that a counter should be opened on all CPUs.
This is true for "core" PMUs for events like "cycles" but isn't true
for uncore PMUs. For an uncore PMU on dual socket x86 it may have a
cpumask advertising opening on CPUs "0,18". If CPU 18 is taken offline
then the cpumask becomes say  "0,19". In the perf tool we don't need
to determine the topology of the system and see that 19 is next to 18
and online, the PMU driver does it for us. If we intersect the cpumask
of an uncore PMU with online CPUs and the cpumask were still "0,18",
then we'd end up only opening the event on one socket (for CPU 0).

What the cpumask is providing is the list of default CPUs we want to
open the event upon, like a "-C" command line option. Yes this is racy
if you are running perf and taking CPUs on and offline, but it's what
happens on x86 and we live with it okay. That's not to say we can't do
a smarter topology system, cover races with things like BPF, and so
on. These things just aren't where the perf tool code is today and
could face challenges wrt permissions, older kernel compatibility and
so on. In the perf tool code today:

 - No cpumask/cpus file was provided on non-hybrid/BIG.little systems
and so such a cpumask is taken to mean open on all online CPUs. If the
cpumask exists but is empty then we also do this. I see broken ARM
memory controller PMUs with empty cpumask files.
 - A cpus file with a list of online CPUs. This is used by core PMUs
on hybrid/BIG.little systems. x86 doesn't place offline CPUs in this
file but ARM does. My hope is that ARM can be consistent with x86.
 - A cpumask with a list of say one CPU per socket. This is generally
used by uncore PMUs, CPUs not in the cpumask are still valid and are
sometimes used to spread interrupt load. The majority of ARM PMU
drivers seem broken here, either wrt offline CPUs, not providing a
cpumask, etc.

Thanks,
Ian