diff mbox series

[1/3] perf pmu: Limit PMU cpumask to online CPUs

Message ID 20240603092812.46616-2-yangyicong@huawei.com (mailing list archive)
State New, archived
Headers show
Series Perf avoid opening events on offline CPUs | expand

Commit Message

Yicong Yang June 3, 2024, 9:28 a.m. UTC
From: Yicong Yang <yangyicong@hisilicon.com>

We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
attributes if provided by the driver without checking the CPUs
are online or not. In such case that CPUs provided by the driver
contains the offline CPUs, we'll try to open event on the offline
CPUs and then rejected by the kernel:

[root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
[root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
/bin/dmesg | grep -i perf may provide additional information.

So it's better to do a double check in the userspace and only include
the online CPUs from "cpumask" or "cpus" to avoid opening events on
offline CPUs.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 tools/perf/util/pmu.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Ian Rogers June 3, 2024, 4:52 p.m. UTC | #1
On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
>
> From: Yicong Yang <yangyicong@hisilicon.com>
>
> We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> attributes if provided by the driver without checking the CPUs
> are online or not. In such case that CPUs provided by the driver
> contains the offline CPUs, we'll try to open event on the offline
> CPUs and then rejected by the kernel:
>
> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> /bin/dmesg | grep -i perf may provide additional information.
>
> So it's better to do a double check in the userspace and only include
> the online CPUs from "cpumask" or "cpus" to avoid opening events on
> offline CPUs.

I see where you are coming from with this but I think it is wrong. The
cpus for an uncore PMU are a hint of the CPU to open on rather than
the set of valid CPUs. For example:
```
$ cat /sys/devices/uncore_imc_free_running_0/cpumask
0
$ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
Using CPUID GenuineIntel-6-8D-1
Attempt to add: uncore_imc_free_running_0/data_read/
..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
  type                             24 (uncore_imc_free_running_0)
  size                             136
  config                           0x20ff (data_read)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -22
switching off cloexec flag
------------------------------------------------------------
perf_event_attr:
  type                             24 (uncore_imc_free_running_0)
  size                             136
  config                           0x20ff (data_read)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0
sys_perf_event_open failed, error -22
switching off exclude_guest, exclude_host
------------------------------------------------------------
perf_event_attr:
  type                             24 (uncore_imc_free_running_0)
  size                             136
  config                           0x20ff (data_read)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0 = 3
uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957

 Performance counter stats for 'system wide':

            244.51 MiB  uncore_imc_free_running_0/data_read/

       0.102320376 seconds time elapsed
```
So the CPU mask of the PMU says to open on CPU 0, but on the command
line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
contained an offline CPU then this change would make it so the CPU map
in the tool were empty, however, a different CPU may be programmable
and online.

Fwiw, the tool will determine whether the mask is for all valid or a
hint by using the notion of a PMU being "core" or not. That notion
considers whether the mask was loading from a "cpumask" or "cpus"
file:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810

Thanks,
Ian

> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
>  tools/perf/util/pmu.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 888ce9912275..51e8d10ee28b 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
>                         continue;
>                 cpus = perf_cpu_map__read(file);
>                 fclose(file);
> -               if (cpus)
> -                       return cpus;
> +               if (cpus) {
> +                       struct perf_cpu_map *intersect __maybe_unused;
> +
> +                       if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> +                               return cpus;
> +
> +                       intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
> +                       perf_cpu_map__put(cpus);
> +                       if (intersect)
> +                               return intersect;
> +               }
>         }
>
>         /* Nothing found, for core PMUs assume this means all CPUs. */
> --
> 2.24.0
>
Yicong Yang June 4, 2024, 11:12 a.m. UTC | #2
On 2024/6/4 0:52, Ian Rogers wrote:
> On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
>>
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
>> attributes if provided by the driver without checking the CPUs
>> are online or not. In such case that CPUs provided by the driver
>> contains the offline CPUs, we'll try to open event on the offline
>> CPUs and then rejected by the kernel:
>>
>> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
>> [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
>> Error:
>> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
>> /bin/dmesg | grep -i perf may provide additional information.
>>
>> So it's better to do a double check in the userspace and only include
>> the online CPUs from "cpumask" or "cpus" to avoid opening events on
>> offline CPUs.
> 
> I see where you are coming from with this but I think it is wrong. The
> cpus for an uncore PMU are a hint of the CPU to open on rather than
> the set of valid CPUs. For example:
> ```
> $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> 0
> $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> Using CPUID GenuineIntel-6-8D-1
> Attempt to add: uncore_imc_free_running_0/data_read/
> ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> Control descriptor is not initialized
> ------------------------------------------------------------
> perf_event_attr:
>   type                             24 (uncore_imc_free_running_0)
>   size                             136
>   config                           0x20ff (data_read)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -22
> switching off cloexec flag
> ------------------------------------------------------------
> perf_event_attr:
>   type                             24 (uncore_imc_free_running_0)
>   size                             136
>   config                           0x20ff (data_read)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0
> sys_perf_event_open failed, error -22
> switching off exclude_guest, exclude_host
> ------------------------------------------------------------
> perf_event_attr:
>   type                             24 (uncore_imc_free_running_0)
>   size                             136
>   config                           0x20ff (data_read)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0 = 3
> uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
> 
>  Performance counter stats for 'system wide':
> 
>             244.51 MiB  uncore_imc_free_running_0/data_read/
> 
>        0.102320376 seconds time elapsed
> ```
> So the CPU mask of the PMU says to open on CPU 0, but on the command
> line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> contained an offline CPU then this change would make it so the CPU map
> in the tool were empty, however, a different CPU may be programmable
> and online.
> 
> Fwiw, the tool will determine whether the mask is for all valid or a
> hint by using the notion of a PMU being "core" or not. That notion
> considers whether the mask was loading from a "cpumask" or "cpus"
> file:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> 

I see, you're correct on this. Maybe I didn't mention it clearly in the commit,
this patch doesn't intend to change the case that user specify the CPUs explicitly.
It'll have difference in case user doesn't specify the CPU list while the PMU's
'cpus' or 'cpumask' contains offline CPUs: with this patch we'll use the online
CPUs in the PMU's 'cpus' or 'cpumask' but before this we may use the offline CPUs.
We still honor the user's input by the handling in __perf_evlist__propagate_maps().

Thanks.
Ian Rogers June 4, 2024, 2:12 p.m. UTC | #3
On Tue, Jun 4, 2024 at 4:12 AM Yicong Yang <yangyicong@huawei.com> wrote:
>
> On 2024/6/4 0:52, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
> > Fwiw, the tool will determine whether the mask is for all valid or a
> > hint by using the notion of a PMU being "core" or not. That notion
> > considers whether the mask was loading from a "cpumask" or "cpus"
> > file:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> >
>
> I see, you're correct on this. Maybe I didn't mention it clearly in the commit,
> this patch doesn't intend to change the case that user specify the CPUs explicitly.
> It'll have difference in case user doesn't specify the CPU list while the PMU's
> 'cpus' or 'cpumask' contains offline CPUs: with this patch we'll use the online
> CPUs in the PMU's 'cpus' or 'cpumask' but before this we may use the offline CPUs.
> We still honor the user's input by the handling in __perf_evlist__propagate_maps().

Thanks Yicong,

I'm having a hard time believing this bug has been there all along
(forever) and also what's happening with BIG.little/hybrid that
advertise CPUs in similar ways. I do see that on armv8_pmuv3_0 the
"cpus" file will have offline cpus while on x86 a lack of such a file
causes a fallback on using the set of online CPUs. So there is a
divergence in the PMUs behavior that may well manifest itself in the
perf tool like this. If the "cpus" file were empty we might fix the
behavior but I don't think that's what's wanted. I also wonder that a
PMU driver change will mean the perf tool is still not working
correctly for people who don't rebuild their kernel.

I'm still thinking about this but thanks for the updates,
Ian
Namhyung Kim June 7, 2024, 12:09 a.m. UTC | #4
Hello,

On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote:
> On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
> >
> > From: Yicong Yang <yangyicong@hisilicon.com>
> >
> > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> > attributes if provided by the driver without checking the CPUs
> > are online or not. In such case that CPUs provided by the driver
> > contains the offline CPUs, we'll try to open event on the offline
> > CPUs and then rejected by the kernel:
> >
> > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > Error:
> > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > /bin/dmesg | grep -i perf may provide additional information.
> >
> > So it's better to do a double check in the userspace and only include
> > the online CPUs from "cpumask" or "cpus" to avoid opening events on
> > offline CPUs.
> 
> I see where you are coming from with this but I think it is wrong. The
> cpus for an uncore PMU are a hint of the CPU to open on rather than
> the set of valid CPUs. For example:
> ```
> $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> 0
> $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> Using CPUID GenuineIntel-6-8D-1
> Attempt to add: uncore_imc_free_running_0/data_read/
> ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> Control descriptor is not initialized
> ------------------------------------------------------------
> perf_event_attr:
>   type                             24 (uncore_imc_free_running_0)
>   size                             136
>   config                           0x20ff (data_read)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -22
> switching off cloexec flag
> ------------------------------------------------------------
> perf_event_attr:
>   type                             24 (uncore_imc_free_running_0)
>   size                             136
>   config                           0x20ff (data_read)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0
> sys_perf_event_open failed, error -22
> switching off exclude_guest, exclude_host
> ------------------------------------------------------------
> perf_event_attr:
>   type                             24 (uncore_imc_free_running_0)
>   size                             136
>   config                           0x20ff (data_read)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0 = 3
> uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
> 
>  Performance counter stats for 'system wide':
> 
>             244.51 MiB  uncore_imc_free_running_0/data_read/
> 
>        0.102320376 seconds time elapsed
> ```
> So the CPU mask of the PMU says to open on CPU 0, but on the command
> line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> contained an offline CPU then this change would make it so the CPU map
> in the tool were empty, however, a different CPU may be programmable
> and online.

I think Intel uncore PMU driver ignores the CPU parameter and set it to
CPU 0 in this case internally.  See uncore_pmu_event_init() at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761

> 
> Fwiw, the tool will determine whether the mask is for all valid or a
> hint by using the notion of a PMU being "core" or not. That notion
> considers whether the mask was loading from a "cpumask" or "cpus"
> file:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> 
> Thanks,
> Ian
> 
> > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > ---
> >  tools/perf/util/pmu.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > index 888ce9912275..51e8d10ee28b 100644
> > --- a/tools/perf/util/pmu.c
> > +++ b/tools/perf/util/pmu.c
> > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> >                         continue;
> >                 cpus = perf_cpu_map__read(file);
> >                 fclose(file);
> > -               if (cpus)
> > -                       return cpus;
> > +               if (cpus) {
> > +                       struct perf_cpu_map *intersect __maybe_unused;
> > +
> > +                       if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> > +                               return cpus;
> > +
> > +                       intersect = perf_cpu_map__intersect(cpus, cpu_map__online());

So IIUC this is for core PMUs with "cpus" file, right?  I guess uncore
drivers already handles "cpumask" properly..

Thanks,
Namhyung


> > +                       perf_cpu_map__put(cpus);
> > +                       if (intersect)
> > +                               return intersect;
> > +               }
> >         }
> >
> >         /* Nothing found, for core PMUs assume this means all CPUs. */
> > --
> > 2.24.0
> >
Ian Rogers June 7, 2024, 12:36 a.m. UTC | #5
On Thu, Jun 6, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
> > >
> > > From: Yicong Yang <yangyicong@hisilicon.com>
> > >
> > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> > > attributes if provided by the driver without checking the CPUs
> > > are online or not. In such case that CPUs provided by the driver
> > > contains the offline CPUs, we'll try to open event on the offline
> > > CPUs and then rejected by the kernel:
> > >
> > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > > Error:
> > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > > /bin/dmesg | grep -i perf may provide additional information.
> > >
> > > So it's better to do a double check in the userspace and only include
> > > the online CPUs from "cpumask" or "cpus" to avoid opening events on
> > > offline CPUs.
> >
> > I see where you are coming from with this but I think it is wrong. The
> > cpus for an uncore PMU are a hint of the CPU to open on rather than
> > the set of valid CPUs. For example:
> > ```
> > $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> > 0
> > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> > Using CPUID GenuineIntel-6-8D-1
> > Attempt to add: uncore_imc_free_running_0/data_read/
> > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> > Control descriptor is not initialized
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             24 (uncore_imc_free_running_0)
> >   size                             136
> >   config                           0x20ff (data_read)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off cloexec flag
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             24 (uncore_imc_free_running_0)
> >   size                             136
> >   config                           0x20ff (data_read)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0
> > sys_perf_event_open failed, error -22
> > switching off exclude_guest, exclude_host
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             24 (uncore_imc_free_running_0)
> >   size                             136
> >   config                           0x20ff (data_read)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0 = 3
> > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
> >
> >  Performance counter stats for 'system wide':
> >
> >             244.51 MiB  uncore_imc_free_running_0/data_read/
> >
> >        0.102320376 seconds time elapsed
> > ```
> > So the CPU mask of the PMU says to open on CPU 0, but on the command
> > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> > contained an offline CPU then this change would make it so the CPU map
> > in the tool were empty, however, a different CPU may be programmable
> > and online.
>
> I think Intel uncore PMU driver ignores the CPU parameter and set it to
> CPU 0 in this case internally.  See uncore_pmu_event_init() at
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761

Hmm.. maybe that's just the option if not set. Wrt hot plugging, on a
2 socket skylake:
```
$ cat /sys/devices/uncore_imc_0/cpumask
0,18
$ echo 0 > /sys/devices/system/cpu/cpu18/online
$ cat /sys/devices/uncore_imc_0/cpumask
0,19
```
So the cpumask should be reflecting the online/offline nature of CPUs.

> >
> > Fwiw, the tool will determine whether the mask is for all valid or a
> > hint by using the notion of a PMU being "core" or not. That notion
> > considers whether the mask was loading from a "cpumask" or "cpus"
> > file:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> >
> > Thanks,
> > Ian
> >
> > > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > > ---
> > >  tools/perf/util/pmu.c | 13 +++++++++++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > > index 888ce9912275..51e8d10ee28b 100644
> > > --- a/tools/perf/util/pmu.c
> > > +++ b/tools/perf/util/pmu.c
> > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> > >                         continue;
> > >                 cpus = perf_cpu_map__read(file);
> > >                 fclose(file);
> > > -               if (cpus)
> > > -                       return cpus;
> > > +               if (cpus) {
> > > +                       struct perf_cpu_map *intersect __maybe_unused;
> > > +
> > > +                       if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> > > +                               return cpus;
> > > +
> > > +                       intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
>
> So IIUC this is for core PMUs with "cpus" file, right?  I guess uncore
> drivers already handles "cpumask" properly..

So I think this is an ARM specific bug:

Core PMUs:
x86 uses /sys/devices/cpu, s390 uses cpum_cf, these lack a cpus or
cpumask and so we default to opening events on all online processors.
The fact these are core PMUs is hardcoded in the tool:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1747
x86 hybrid /sys/devices/cpu_(core|atom)/cpus - the set of CPUs is
updated to reflect online and offline
ARM the /sys/devices/armv8_pmuv3_0 isn't a hardcoded core PMU and so
we expect the cpus to contain online CPUs, but it currently also
erroneously contains offline ones.

Uncore PMUs:
x86 has /sys/devices/<uncore...>/cpumask where the cpumask reflects
online and offline CPUs
ARM has things like the dmc620 PMU, it appears to be missing cpumask
in certain cases leading to the perf tool treating it like the x86
core PMU and opening events for it on every CPU:
```
# ls /sys/devices/arm_dmc620_10008c000/
events  format  perf_event_mux_interval_ms  power  subsystem  type  uevent
```

I think we need a PMU test so that bugs like this can be reported, but
we may also need to add tool workarounds for PMUs that are broken. I
can imagine it will be tricky to test uncore PMUs that default to CPU0
as often we can't offline that.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > > +                       perf_cpu_map__put(cpus);
> > > +                       if (intersect)
> > > +                               return intersect;
> > > +               }
> > >         }
> > >
> > >         /* Nothing found, for core PMUs assume this means all CPUs. */
> > > --
> > > 2.24.0
> > >
Ian Rogers June 7, 2024, 7:04 a.m. UTC | #6
On Thu, Jun 6, 2024 at 5:36 PM Ian Rogers <irogers@google.com> wrote:
>
> On Thu, Jun 6, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote:
> > > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote:
> > > >
> > > > From: Yicong Yang <yangyicong@hisilicon.com>
> > > >
> > > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> > > > attributes if provided by the driver without checking the CPUs
> > > > are online or not. In such case that CPUs provided by the driver
> > > > contains the offline CPUs, we'll try to open event on the offline
> > > > CPUs and then rejected by the kernel:
> > > >
> > > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > > > Error:
> > > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > > > /bin/dmesg | grep -i perf may provide additional information.
> > > >
> > > > So it's better to do a double check in the userspace and only include
> > > > the online CPUs from "cpumask" or "cpus" to avoid opening events on
> > > > offline CPUs.
> > >
> > > I see where you are coming from with this but I think it is wrong. The
> > > cpus for an uncore PMU are a hint of the CPU to open on rather than
> > > the set of valid CPUs. For example:
> > > ```
> > > $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> > > 0
> > > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> > > Using CPUID GenuineIntel-6-8D-1
> > > Attempt to add: uncore_imc_free_running_0/data_read/
> > > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> > > Control descriptor is not initialized
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             24 (uncore_imc_free_running_0)
> > >   size                             136
> > >   config                           0x20ff (data_read)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
> > > sys_perf_event_open failed, error -22
> > > switching off cloexec flag
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             24 (uncore_imc_free_running_0)
> > >   size                             136
> > >   config                           0x20ff (data_read)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0
> > > sys_perf_event_open failed, error -22
> > > switching off exclude_guest, exclude_host
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             24 (uncore_imc_free_running_0)
> > >   size                             136
> > >   config                           0x20ff (data_read)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0 = 3
> > > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> > > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
> > >
> > >  Performance counter stats for 'system wide':
> > >
> > >             244.51 MiB  uncore_imc_free_running_0/data_read/
> > >
> > >        0.102320376 seconds time elapsed
> > > ```
> > > So the CPU mask of the PMU says to open on CPU 0, but on the command
> > > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> > > contained an offline CPU then this change would make it so the CPU map
> > > in the tool were empty, however, a different CPU may be programmable
> > > and online.
> >
> > I think Intel uncore PMU driver ignores the CPU parameter and set it to
> > CPU 0 in this case internally.  See uncore_pmu_event_init() at
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761
>
> Hmm.. maybe that's just the option if not set. Wrt hot plugging, on a
> 2 socket skylake:
> ```
> $ cat /sys/devices/uncore_imc_0/cpumask
> 0,18
> $ echo 0 > /sys/devices/system/cpu/cpu18/online
> $ cat /sys/devices/uncore_imc_0/cpumask
> 0,19
> ```
> So the cpumask should be reflecting the online/offline nature of CPUs.
>
> > >
> > > Fwiw, the tool will determine whether the mask is for all valid or a
> > > hint by using the notion of a PMU being "core" or not. That notion
> > > considers whether the mask was loading from a "cpumask" or "cpus"
> > > file:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> > >
> > > Thanks,
> > > Ian
> > >
> > > > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > > > ---
> > > >  tools/perf/util/pmu.c | 13 +++++++++++--
> > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > > > index 888ce9912275..51e8d10ee28b 100644
> > > > --- a/tools/perf/util/pmu.c
> > > > +++ b/tools/perf/util/pmu.c
> > > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> > > >                         continue;
> > > >                 cpus = perf_cpu_map__read(file);
> > > >                 fclose(file);
> > > > -               if (cpus)
> > > > -                       return cpus;
> > > > +               if (cpus) {
> > > > +                       struct perf_cpu_map *intersect __maybe_unused;
> > > > +
> > > > +                       if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> > > > +                               return cpus;
> > > > +
> > > > +                       intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
> >
> > So IIUC this is for core PMUs with "cpus" file, right?  I guess uncore
> > drivers already handles "cpumask" properly..
>
> So I think this is an ARM specific bug:
>
> Core PMUs:
> x86 uses /sys/devices/cpu, s390 uses cpum_cf, these lack a cpus or
> cpumask and so we default to opening events on all online processors.
> The fact these are core PMUs is hardcoded in the tool:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1747
> x86 hybrid /sys/devices/cpu_(core|atom)/cpus - the set of CPUs is
> updated to reflect online and offline
> ARM the /sys/devices/armv8_pmuv3_0 isn't a hardcoded core PMU and so
> we expect the cpus to contain online CPUs, but it currently also
> erroneously contains offline ones.
>
> Uncore PMUs:
> x86 has /sys/devices/<uncore...>/cpumask where the cpumask reflects
> online and offline CPUs
> ARM has things like the dmc620 PMU, it appears to be missing cpumask
> in certain cases leading to the perf tool treating it like the x86
> core PMU and opening events for it on every CPU:
> ```
> # ls /sys/devices/arm_dmc620_10008c000/
> events  format  perf_event_mux_interval_ms  power  subsystem  type  uevent
> ```
>
> I think we need a PMU test so that bugs like this can be reported, but
> we may also need to add tool workarounds for PMUs that are broken. I
> can imagine it will be tricky to test uncore PMUs that default to CPU0
> as often we can't offline that.

I think we should make this an ARM specific fix up like:
https://lore.kernel.org/lkml/20240607065343.695369-1-irogers@google.com/
The PMU needs fixing up like in the rest of the change, but as perf
can be run on older kernels I think this workaround will remain
necessary. The arm_cmn PMU appears to handle CPU hot plugging, the
arm_dmc620 lacks a cpumask altogether on the test machine I can
access. I suspect we may want a better uncore fix as we may change a
CPU map of 1 CPU into an empty CPU map, for example, if the cpumask is
"0" and CPU0 is offline, then "1" would be a better alternative than
the empty CPU map. I couldn't find a PMU to test this with.

Thanks,
Ian

> Thanks,
> Ian
>
> > Thanks,
> > Namhyung
> >
> >
> > > > +                       perf_cpu_map__put(cpus);
> > > > +                       if (intersect)
> > > > +                               return intersect;
> > > > +               }
> > > >         }
> > > >
> > > >         /* Nothing found, for core PMUs assume this means all CPUs. */
> > > > --
> > > > 2.24.0
> > > >
diff mbox series

Patch

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 888ce9912275..51e8d10ee28b 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -771,8 +771,17 @@  static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
 			continue;
 		cpus = perf_cpu_map__read(file);
 		fclose(file);
-		if (cpus)
-			return cpus;
+		if (cpus) {
+			struct perf_cpu_map *intersect __maybe_unused;
+
+			if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
+				return cpus;
+
+			intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
+			perf_cpu_map__put(cpus);
+			if (intersect)
+				return intersect;
+		}
 	}
 
 	/* Nothing found, for core PMUs assume this means all CPUs. */