Message ID | 20240603092812.46616-2-yangyicong@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Perf avoid opening events on offline CPUs | expand |
On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: > > From: Yicong Yang <yangyicong@hisilicon.com> > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs > attributes if provided by the driver without checking the CPUs > are online or not. In such case that CPUs provided by the driver > contains the offline CPUs, we'll try to open event on the offline > CPUs and then rejected by the kernel: > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 > Error: > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). > /bin/dmesg | grep -i perf may provide additional information. > > So it's better to do a double check in the userspace and only include > the online CPUs from "cpumask" or "cpus" to avoid opening events on > offline CPUs. I see where you are coming from with this but I think it is wrong. The cpus for an uncore PMU are a hint of the CPU to open on rather than the set of valid CPUs. For example: ``` $ cat /sys/devices/uncore_imc_free_running_0/cpumask 0 $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1 Using CPUID GenuineIntel-6-8D-1 Attempt to add: uncore_imc_free_running_0/data_read/ ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 24 (uncore_imc_free_running_0) size 136 config 0x20ff (data_read) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off cloexec flag ------------------------------------------------------------ perf_event_attr: type 24 (uncore_imc_free_running_0) size 136 config 0x20ff (data_read) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 sys_perf_event_open failed, error -22 switching off exclude_guest, exclude_host ------------------------------------------------------------ perf_event_attr: type 24 (uncore_imc_free_running_0) size 136 config 0x20ff (data_read) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3 uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957 uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957 Performance counter stats for 'system wide': 244.51 MiB uncore_imc_free_running_0/data_read/ 0.102320376 seconds time elapsed ``` So the CPU mask of the PMU says to open on CPU 0, but on the command line when I passed "-C 1" it opened it on CPU 1. If the cpumask file contained an offline CPU then this change would make it so the CPU map in the tool were empty, however, a different CPU may be programmable and online. Fwiw, the tool will determine whether the mask is for all valid or a hint by using the notion of a PMU being "core" or not. That notion considers whether the mask was loading from a "cpumask" or "cpus" file: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810 Thanks, Ian > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> > --- > tools/perf/util/pmu.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c > index 888ce9912275..51e8d10ee28b 100644 > --- a/tools/perf/util/pmu.c > +++ b/tools/perf/util/pmu.c > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor > continue; > cpus = perf_cpu_map__read(file); > fclose(file); > - if (cpus) > - return cpus; > + if (cpus) { > + struct perf_cpu_map *intersect __maybe_unused; > + > + if (perf_cpu_map__is_subset(cpu_map__online(), cpus)) > + return cpus; > + > + intersect = perf_cpu_map__intersect(cpus, cpu_map__online()); > + perf_cpu_map__put(cpus); > + if (intersect) > + return intersect; > + } > } > > /* Nothing found, for core PMUs assume this means all CPUs. */ > -- > 2.24.0 >
On 2024/6/4 0:52, Ian Rogers wrote: > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: >> >> From: Yicong Yang <yangyicong@hisilicon.com> >> >> We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs >> attributes if provided by the driver without checking the CPUs >> are online or not. In such case that CPUs provided by the driver >> contains the offline CPUs, we'll try to open event on the offline >> CPUs and then rejected by the kernel: >> >> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online >> [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 >> Error: >> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). >> /bin/dmesg | grep -i perf may provide additional information. >> >> So it's better to do a double check in the userspace and only include >> the online CPUs from "cpumask" or "cpus" to avoid opening events on >> offline CPUs. > > I see where you are coming from with this but I think it is wrong. The > cpus for an uncore PMU are a hint of the CPU to open on rather than > the set of valid CPUs. For example: > ``` > $ cat /sys/devices/uncore_imc_free_running_0/cpumask > 0 > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1 > Using CPUID GenuineIntel-6-8D-1 > Attempt to add: uncore_imc_free_running_0/data_read/ > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/ > Control descriptor is not initialized > ------------------------------------------------------------ > perf_event_attr: > type 24 (uncore_imc_free_running_0) > size 136 > config 0x20ff (data_read) > sample_type IDENTIFIER > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > disabled 1 > inherit 1 > exclude_guest 1 > ------------------------------------------------------------ > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 > sys_perf_event_open failed, error -22 > switching off cloexec flag > ------------------------------------------------------------ > perf_event_attr: > type 24 (uncore_imc_free_running_0) > size 136 > config 0x20ff (data_read) > sample_type IDENTIFIER > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > disabled 1 > inherit 1 > exclude_guest 1 > ------------------------------------------------------------ > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 > sys_perf_event_open failed, error -22 > switching off exclude_guest, exclude_host > ------------------------------------------------------------ > perf_event_attr: > type 24 (uncore_imc_free_running_0) > size 136 > config 0x20ff (data_read) > sample_type IDENTIFIER > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > disabled 1 > inherit 1 > ------------------------------------------------------------ > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3 > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957 > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957 > > Performance counter stats for 'system wide': > > 244.51 MiB uncore_imc_free_running_0/data_read/ > > 0.102320376 seconds time elapsed > ``` > So the CPU mask of the PMU says to open on CPU 0, but on the command > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file > contained an offline CPU then this change would make it so the CPU map > in the tool were empty, however, a different CPU may be programmable > and online. > > Fwiw, the tool will determine whether the mask is for all valid or a > hint by using the notion of a PMU being "core" or not. That notion > considers whether the mask was loading from a "cpumask" or "cpus" > file: > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810 > I see, you're correct on this. Maybe I didn't mention it clearly in the commit, this patch doesn't intend to change the case that user specify the CPUs explicitly. It'll have difference in case user doesn't specify the CPU list while the PMU's 'cpus' or 'cpumask' contains offline CPUs: with this patch we'll use the online CPUs in the PMU's 'cpus' or 'cpumask' but before this we may use the offline CPUs. We still honor the user's input by the handling in __perf_evlist__propagate_maps(). Thanks.
On Tue, Jun 4, 2024 at 4:12 AM Yicong Yang <yangyicong@huawei.com> wrote: > > On 2024/6/4 0:52, Ian Rogers wrote: > > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: > > Fwiw, the tool will determine whether the mask is for all valid or a > > hint by using the notion of a PMU being "core" or not. That notion > > considers whether the mask was loading from a "cpumask" or "cpus" > > file: > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810 > > > > I see, you're correct on this. Maybe I didn't mention it clearly in the commit, > this patch doesn't intend to change the case that user specify the CPUs explicitly. > It'll have difference in case user doesn't specify the CPU list while the PMU's > 'cpus' or 'cpumask' contains offline CPUs: with this patch we'll use the online > CPUs in the PMU's 'cpus' or 'cpumask' but before this we may use the offline CPUs. > We still honor the user's input by the handling in __perf_evlist__propagate_maps(). Thanks Yicong, I'm having a hard time believing this bug has been there all along (forever) and also what's happening with BIG.little/hybrid that advertise CPUs in similar ways. I do see that on armv8_pmuv3_0 the "cpus" file will have offline cpus while on x86 a lack of such a file causes a fallback on using the set of online CPUs. So there is a divergence in the PMUs behavior that may well manifest itself in the perf tool like this. If the "cpus" file were empty we might fix the behavior but I don't think that's what's wanted. I also wonder that a PMU driver change will mean the perf tool is still not working correctly for people who don't rebuild their kernel. I'm still thinking about this but thanks for the updates, Ian
Hello, On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote: > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: > > > > From: Yicong Yang <yangyicong@hisilicon.com> > > > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs > > attributes if provided by the driver without checking the CPUs > > are online or not. In such case that CPUs provided by the driver > > contains the offline CPUs, we'll try to open event on the offline > > CPUs and then rejected by the kernel: > > > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 > > Error: > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). > > /bin/dmesg | grep -i perf may provide additional information. > > > > So it's better to do a double check in the userspace and only include > > the online CPUs from "cpumask" or "cpus" to avoid opening events on > > offline CPUs. > > I see where you are coming from with this but I think it is wrong. The > cpus for an uncore PMU are a hint of the CPU to open on rather than > the set of valid CPUs. For example: > ``` > $ cat /sys/devices/uncore_imc_free_running_0/cpumask > 0 > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1 > Using CPUID GenuineIntel-6-8D-1 > Attempt to add: uncore_imc_free_running_0/data_read/ > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/ > Control descriptor is not initialized > ------------------------------------------------------------ > perf_event_attr: > type 24 (uncore_imc_free_running_0) > size 136 > config 0x20ff (data_read) > sample_type IDENTIFIER > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > disabled 1 > inherit 1 > exclude_guest 1 > ------------------------------------------------------------ > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 > sys_perf_event_open failed, error -22 > switching off cloexec flag > ------------------------------------------------------------ > perf_event_attr: > type 24 (uncore_imc_free_running_0) > size 136 > config 0x20ff (data_read) > sample_type IDENTIFIER > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > disabled 1 > inherit 1 > exclude_guest 1 > ------------------------------------------------------------ > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 > sys_perf_event_open failed, error -22 > switching off exclude_guest, exclude_host > ------------------------------------------------------------ > perf_event_attr: > type 24 (uncore_imc_free_running_0) > size 136 > config 0x20ff (data_read) > sample_type IDENTIFIER > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > disabled 1 > inherit 1 > ------------------------------------------------------------ > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3 > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957 > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957 > > Performance counter stats for 'system wide': > > 244.51 MiB uncore_imc_free_running_0/data_read/ > > 0.102320376 seconds time elapsed > ``` > So the CPU mask of the PMU says to open on CPU 0, but on the command > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file > contained an offline CPU then this change would make it so the CPU map > in the tool were empty, however, a different CPU may be programmable > and online. I think Intel uncore PMU driver ignores the CPU parameter and set it to CPU 0 in this case internally. See uncore_pmu_event_init() at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761 > > Fwiw, the tool will determine whether the mask is for all valid or a > hint by using the notion of a PMU being "core" or not. That notion > considers whether the mask was loading from a "cpumask" or "cpus" > file: > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810 > > Thanks, > Ian > > > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> > > --- > > tools/perf/util/pmu.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c > > index 888ce9912275..51e8d10ee28b 100644 > > --- a/tools/perf/util/pmu.c > > +++ b/tools/perf/util/pmu.c > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor > > continue; > > cpus = perf_cpu_map__read(file); > > fclose(file); > > - if (cpus) > > - return cpus; > > + if (cpus) { > > + struct perf_cpu_map *intersect __maybe_unused; > > + > > + if (perf_cpu_map__is_subset(cpu_map__online(), cpus)) > > + return cpus; > > + > > + intersect = perf_cpu_map__intersect(cpus, cpu_map__online()); So IIUC this is for core PMUs with "cpus" file, right? I guess uncore drivers already handles "cpumask" properly.. Thanks, Namhyung > > + perf_cpu_map__put(cpus); > > + if (intersect) > > + return intersect; > > + } > > } > > > > /* Nothing found, for core PMUs assume this means all CPUs. */ > > -- > > 2.24.0 > >
On Thu, Jun 6, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote: > > Hello, > > On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote: > > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: > > > > > > From: Yicong Yang <yangyicong@hisilicon.com> > > > > > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs > > > attributes if provided by the driver without checking the CPUs > > > are online or not. In such case that CPUs provided by the driver > > > contains the offline CPUs, we'll try to open event on the offline > > > CPUs and then rejected by the kernel: > > > > > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online > > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 > > > Error: > > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). > > > /bin/dmesg | grep -i perf may provide additional information. > > > > > > So it's better to do a double check in the userspace and only include > > > the online CPUs from "cpumask" or "cpus" to avoid opening events on > > > offline CPUs. > > > > I see where you are coming from with this but I think it is wrong. The > > cpus for an uncore PMU are a hint of the CPU to open on rather than > > the set of valid CPUs. For example: > > ``` > > $ cat /sys/devices/uncore_imc_free_running_0/cpumask > > 0 > > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1 > > Using CPUID GenuineIntel-6-8D-1 > > Attempt to add: uncore_imc_free_running_0/data_read/ > > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/ > > Control descriptor is not initialized > > ------------------------------------------------------------ > > perf_event_attr: > > type 24 (uncore_imc_free_running_0) > > size 136 > > config 0x20ff (data_read) > > sample_type IDENTIFIER > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > > disabled 1 > > inherit 1 > > exclude_guest 1 > > ------------------------------------------------------------ > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 > > sys_perf_event_open failed, error -22 > > switching off cloexec flag > > ------------------------------------------------------------ > > perf_event_attr: > > type 24 (uncore_imc_free_running_0) > > size 136 > > config 0x20ff (data_read) > > sample_type IDENTIFIER > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > > disabled 1 > > inherit 1 > > exclude_guest 1 > > ------------------------------------------------------------ > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 > > sys_perf_event_open failed, error -22 > > switching off exclude_guest, exclude_host > > ------------------------------------------------------------ > > perf_event_attr: > > type 24 (uncore_imc_free_running_0) > > size 136 > > config 0x20ff (data_read) > > sample_type IDENTIFIER > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > > disabled 1 > > inherit 1 > > ------------------------------------------------------------ > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3 > > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957 > > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957 > > > > Performance counter stats for 'system wide': > > > > 244.51 MiB uncore_imc_free_running_0/data_read/ > > > > 0.102320376 seconds time elapsed > > ``` > > So the CPU mask of the PMU says to open on CPU 0, but on the command > > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file > > contained an offline CPU then this change would make it so the CPU map > > in the tool were empty, however, a different CPU may be programmable > > and online. > > I think Intel uncore PMU driver ignores the CPU parameter and set it to > CPU 0 in this case internally. See uncore_pmu_event_init() at > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761 Hmm.. maybe that's just the option if not set. Wrt hot plugging, on a 2 socket skylake: ``` $ cat /sys/devices/uncore_imc_0/cpumask 0,18 $ echo 0 > /sys/devices/system/cpu/cpu18/online $ cat /sys/devices/uncore_imc_0/cpumask 0,19 ``` So the cpumask should be reflecting the online/offline nature of CPUs. > > > > Fwiw, the tool will determine whether the mask is for all valid or a > > hint by using the notion of a PMU being "core" or not. That notion > > considers whether the mask was loading from a "cpumask" or "cpus" > > file: > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810 > > > > Thanks, > > Ian > > > > > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> > > > --- > > > tools/perf/util/pmu.c | 13 +++++++++++-- > > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c > > > index 888ce9912275..51e8d10ee28b 100644 > > > --- a/tools/perf/util/pmu.c > > > +++ b/tools/perf/util/pmu.c > > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor > > > continue; > > > cpus = perf_cpu_map__read(file); > > > fclose(file); > > > - if (cpus) > > > - return cpus; > > > + if (cpus) { > > > + struct perf_cpu_map *intersect __maybe_unused; > > > + > > > + if (perf_cpu_map__is_subset(cpu_map__online(), cpus)) > > > + return cpus; > > > + > > > + intersect = perf_cpu_map__intersect(cpus, cpu_map__online()); > > So IIUC this is for core PMUs with "cpus" file, right? I guess uncore > drivers already handles "cpumask" properly.. So I think this is an ARM specific bug: Core PMUs: x86 uses /sys/devices/cpu, s390 uses cpum_cf, these lack a cpus or cpumask and so we default to opening events on all online processors. The fact these are core PMUs is hardcoded in the tool: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1747 x86 hybrid /sys/devices/cpu_(core|atom)/cpus - the set of CPUs is updated to reflect online and offline ARM the /sys/devices/armv8_pmuv3_0 isn't a hardcoded core PMU and so we expect the cpus to contain online CPUs, but it currently also erroneously contains offline ones. Uncore PMUs: x86 has /sys/devices/<uncore...>/cpumask where the cpumask reflects online and offline CPUs ARM has things like the dmc620 PMU, it appears to be missing cpumask in certain cases leading to the perf tool treating it like the x86 core PMU and opening events for it on every CPU: ``` # ls /sys/devices/arm_dmc620_10008c000/ events format perf_event_mux_interval_ms power subsystem type uevent ``` I think we need a PMU test so that bugs like this can be reported, but we may also need to add tool workarounds for PMUs that are broken. I can imagine it will be tricky to test uncore PMUs that default to CPU0 as often we can't offline that. Thanks, Ian > Thanks, > Namhyung > > > > > + perf_cpu_map__put(cpus); > > > + if (intersect) > > > + return intersect; > > > + } > > > } > > > > > > /* Nothing found, for core PMUs assume this means all CPUs. */ > > > -- > > > 2.24.0 > > >
On Thu, Jun 6, 2024 at 5:36 PM Ian Rogers <irogers@google.com> wrote: > > On Thu, Jun 6, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > Hello, > > > > On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote: > > > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@huawei.com> wrote: > > > > > > > > From: Yicong Yang <yangyicong@hisilicon.com> > > > > > > > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs > > > > attributes if provided by the driver without checking the CPUs > > > > are online or not. In such case that CPUs provided by the driver > > > > contains the offline CPUs, we'll try to open event on the offline > > > > CPUs and then rejected by the kernel: > > > > > > > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online > > > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100 > > > > Error: > > > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock). > > > > /bin/dmesg | grep -i perf may provide additional information. > > > > > > > > So it's better to do a double check in the userspace and only include > > > > the online CPUs from "cpumask" or "cpus" to avoid opening events on > > > > offline CPUs. > > > > > > I see where you are coming from with this but I think it is wrong. The > > > cpus for an uncore PMU are a hint of the CPU to open on rather than > > > the set of valid CPUs. For example: > > > ``` > > > $ cat /sys/devices/uncore_imc_free_running_0/cpumask > > > 0 > > > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1 > > > Using CPUID GenuineIntel-6-8D-1 > > > Attempt to add: uncore_imc_free_running_0/data_read/ > > > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/ > > > Control descriptor is not initialized > > > ------------------------------------------------------------ > > > perf_event_attr: > > > type 24 (uncore_imc_free_running_0) > > > size 136 > > > config 0x20ff (data_read) > > > sample_type IDENTIFIER > > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > > > disabled 1 > > > inherit 1 > > > exclude_guest 1 > > > ------------------------------------------------------------ > > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 > > > sys_perf_event_open failed, error -22 > > > switching off cloexec flag > > > ------------------------------------------------------------ > > > perf_event_attr: > > > type 24 (uncore_imc_free_running_0) > > > size 136 > > > config 0x20ff (data_read) > > > sample_type IDENTIFIER > > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > > > disabled 1 > > > inherit 1 > > > exclude_guest 1 > > > ------------------------------------------------------------ > > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 > > > sys_perf_event_open failed, error -22 > > > switching off exclude_guest, exclude_host > > > ------------------------------------------------------------ > > > perf_event_attr: > > > type 24 (uncore_imc_free_running_0) > > > size 136 > > > config 0x20ff (data_read) > > > sample_type IDENTIFIER > > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING > > > disabled 1 > > > inherit 1 > > > ------------------------------------------------------------ > > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3 > > > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957 > > > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957 > > > > > > Performance counter stats for 'system wide': > > > > > > 244.51 MiB uncore_imc_free_running_0/data_read/ > > > > > > 0.102320376 seconds time elapsed > > > ``` > > > So the CPU mask of the PMU says to open on CPU 0, but on the command > > > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file > > > contained an offline CPU then this change would make it so the CPU map > > > in the tool were empty, however, a different CPU may be programmable > > > and online. > > > > I think Intel uncore PMU driver ignores the CPU parameter and set it to > > CPU 0 in this case internally. See uncore_pmu_event_init() at > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761 > > Hmm.. maybe that's just the option if not set. Wrt hot plugging, on a > 2 socket skylake: > ``` > $ cat /sys/devices/uncore_imc_0/cpumask > 0,18 > $ echo 0 > /sys/devices/system/cpu/cpu18/online > $ cat /sys/devices/uncore_imc_0/cpumask > 0,19 > ``` > So the cpumask should be reflecting the online/offline nature of CPUs. > > > > > > > Fwiw, the tool will determine whether the mask is for all valid or a > > > hint by using the notion of a PMU being "core" or not. That notion > > > considers whether the mask was loading from a "cpumask" or "cpus" > > > file: > > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810 > > > > > > Thanks, > > > Ian > > > > > > > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> > > > > --- > > > > tools/perf/util/pmu.c | 13 +++++++++++-- > > > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c > > > > index 888ce9912275..51e8d10ee28b 100644 > > > > --- a/tools/perf/util/pmu.c > > > > +++ b/tools/perf/util/pmu.c > > > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor > > > > continue; > > > > cpus = perf_cpu_map__read(file); > > > > fclose(file); > > > > - if (cpus) > > > > - return cpus; > > > > + if (cpus) { > > > > + struct perf_cpu_map *intersect __maybe_unused; > > > > + > > > > + if (perf_cpu_map__is_subset(cpu_map__online(), cpus)) > > > > + return cpus; > > > > + > > > > + intersect = perf_cpu_map__intersect(cpus, cpu_map__online()); > > > > So IIUC this is for core PMUs with "cpus" file, right? I guess uncore > > drivers already handles "cpumask" properly.. > > So I think this is an ARM specific bug: > > Core PMUs: > x86 uses /sys/devices/cpu, s390 uses cpum_cf, these lack a cpus or > cpumask and so we default to opening events on all online processors. > The fact these are core PMUs is hardcoded in the tool: > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1747 > x86 hybrid /sys/devices/cpu_(core|atom)/cpus - the set of CPUs is > updated to reflect online and offline > ARM the /sys/devices/armv8_pmuv3_0 isn't a hardcoded core PMU and so > we expect the cpus to contain online CPUs, but it currently also > erroneously contains offline ones. > > Uncore PMUs: > x86 has /sys/devices/<uncore...>/cpumask where the cpumask reflects > online and offline CPUs > ARM has things like the dmc620 PMU, it appears to be missing cpumask > in certain cases leading to the perf tool treating it like the x86 > core PMU and opening events for it on every CPU: > ``` > # ls /sys/devices/arm_dmc620_10008c000/ > events format perf_event_mux_interval_ms power subsystem type uevent > ``` > > I think we need a PMU test so that bugs like this can be reported, but > we may also need to add tool workarounds for PMUs that are broken. I > can imagine it will be tricky to test uncore PMUs that default to CPU0 > as often we can't offline that. I think we should make this an ARM specific fix up like: https://lore.kernel.org/lkml/20240607065343.695369-1-irogers@google.com/ The PMU needs fixing up like in the rest of the change, but as perf can be run on older kernels I think this workaround will remain necessary. The arm_cmn PMU appears to handle CPU hot plugging, the arm_dmc620 lacks a cpumask altogether on the test machine I can access. I suspect we may want a better uncore fix as we may change a CPU map of 1 CPU into an empty CPU map, for example, if the cpumask is "0" and CPU0 is offline, then "1" would be a better alternative than the empty CPU map. I couldn't find a PMU to test this with. Thanks, Ian > Thanks, > Ian > > > Thanks, > > Namhyung > > > > > > > > + perf_cpu_map__put(cpus); > > > > + if (intersect) > > > > + return intersect; > > > > + } > > > > } > > > > > > > > /* Nothing found, for core PMUs assume this means all CPUs. */ > > > > -- > > > > 2.24.0 > > > >
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 888ce9912275..51e8d10ee28b 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor continue; cpus = perf_cpu_map__read(file); fclose(file); - if (cpus) - return cpus; + if (cpus) { + struct perf_cpu_map *intersect __maybe_unused; + + if (perf_cpu_map__is_subset(cpu_map__online(), cpus)) + return cpus; + + intersect = perf_cpu_map__intersect(cpus, cpu_map__online()); + perf_cpu_map__put(cpus); + if (intersect) + return intersect; + } } /* Nothing found, for core PMUs assume this means all CPUs. */