mbox series

[v6,00/15] perf c2c: Support data source and display for Arm64

Message ID 20220811062451.435810-1-leo.yan@linaro.org (mailing list archive)
Headers show
Series perf c2c: Support data source and display for Arm64 | expand

Message

Leo Yan Aug. 11, 2022, 6:24 a.m. UTC
Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
us to detect cache line contention and transfers.

This patch set has been rebased on the acme/perf/core branch with the latest
commit b39c9e1b101d ("perf machine: Fix missing free of
machine->kallsyms_filename").

To make building success, a compilation fixing commit [1] has been sent
to LKML, this patch set is dependent on it.  This patch set has been verified
for both x86 perf memory events and Arm SPE events.

[1] https://lore.kernel.org/lkml/20220811044341.426796-1-leo.yan@linaro.org/

Changes from v5:
* Removed the patch "perf: Add SNOOP_PEER flag to perf mem data struct"
  (Arnaldo);
* Removed the patch "perf arm-spe: Don't set data source if it's not a
  memory operation" which has been merged in the mainline kernel, so can
  dismiss merging conflict.
* Rebased on the latest acme perf/core branch, no any code change
  compared to previous version.

Changes from v4:
* Included Ali's patch set for adding data source in Arm SPE samples;
* Added Ian's ACK and Ali's review and test tags;
* Update document for the default peer dispaly for Arm64 (Ali).

Changes from v3:
* Changed to display remote and local peer accesses (Joe);
* Fixed the usage info for display types (Joe);
* Do not display HITM dimensions when use 'peer' display, and HITM
  display doesn't show any 'peer' dimensions (James);
* Split to smaller patches for adding dimensions of peer operations;
* Updated documentation to reflect the latest GUI and stdio.


Ali Saidi (2):
  perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  perf arm-spe: Use SPE data source for neoverse cores

Leo Yan (13):
  perf mem: Print snoop peer flag
  perf mem: Add statistics for peer snooping
  perf c2c: Output statistics for peer snooping
  perf c2c: Add dimensions for peer load operations
  perf c2c: Add dimensions of peer metrics for cache line view
  perf c2c: Add mean dimensions for peer operations
  perf c2c: Use explicit names for display macros
  perf c2c: Rename dimension from 'percent_hitm' to
    'percent_costly_snoop'
  perf c2c: Refactor node header
  perf c2c: Refactor display string
  perf c2c: Sort on peer snooping for load operations
  perf c2c: Use 'peer' as default display for Arm64
  perf c2c: Update documentation for new display option 'peer'

 tools/include/uapi/linux/perf_event.h         |   2 +-
 tools/perf/Documentation/perf-c2c.txt         |  31 +-
 tools/perf/builtin-c2c.c                      | 454 ++++++++++++++----
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 130 ++++-
 tools/perf/util/mem-events.c                  |  46 +-
 tools/perf/util/mem-events.h                  |   3 +
 8 files changed, 547 insertions(+), 132 deletions(-)

Comments

Arnaldo Carvalho de Melo Aug. 11, 2022, 10:25 p.m. UTC | #1
Em Thu, Aug 11, 2022 at 02:24:36PM +0800, Leo Yan escreveu:
> Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> us to detect cache line contention and transfers.
> 
> This patch set has been rebased on the acme/perf/core branch with the latest
> commit b39c9e1b101d ("perf machine: Fix missing free of
> machine->kallsyms_filename").
> 
> To make building success, a compilation fixing commit [1] has been sent
> to LKML, this patch set is dependent on it.  This patch set has been verified
> for both x86 perf memory events and Arm SPE events.
> 
> [1] https://lore.kernel.org/lkml/20220811044341.426796-1-leo.yan@linaro.org/

So, I tentatively applied this set after applying the patch for
<asm/sysreg.h>, and its all now out in tmp.perf/core in my git tree,
please check.

I'm doing the usual set of container build tests, but any additional
checking, including on the committer note I added to the first patch in
this series, claryfing it is not really a "sync" with the kernel
headers, is more than welcome.

- Arnaldo
 
> Changes from v5:
> * Removed the patch "perf: Add SNOOP_PEER flag to perf mem data struct"
>   (Arnaldo);
> * Removed the patch "perf arm-spe: Don't set data source if it's not a
>   memory operation" which has been merged in the mainline kernel, so can
>   dismiss merging conflict.
> * Rebased on the latest acme perf/core branch, no any code change
>   compared to previous version.
> 
> Changes from v4:
> * Included Ali's patch set for adding data source in Arm SPE samples;
> * Added Ian's ACK and Ali's review and test tags;
> * Update document for the default peer dispaly for Arm64 (Ali).
> 
> Changes from v3:
> * Changed to display remote and local peer accesses (Joe);
> * Fixed the usage info for display types (Joe);
> * Do not display HITM dimensions when use 'peer' display, and HITM
>   display doesn't show any 'peer' dimensions (James);
> * Split to smaller patches for adding dimensions of peer operations;
> * Updated documentation to reflect the latest GUI and stdio.
> 
> 
> Ali Saidi (2):
>   perf tools: sync addition of PERF_MEM_SNOOPX_PEER
>   perf arm-spe: Use SPE data source for neoverse cores
> 
> Leo Yan (13):
>   perf mem: Print snoop peer flag
>   perf mem: Add statistics for peer snooping
>   perf c2c: Output statistics for peer snooping
>   perf c2c: Add dimensions for peer load operations
>   perf c2c: Add dimensions of peer metrics for cache line view
>   perf c2c: Add mean dimensions for peer operations
>   perf c2c: Use explicit names for display macros
>   perf c2c: Rename dimension from 'percent_hitm' to
>     'percent_costly_snoop'
>   perf c2c: Refactor node header
>   perf c2c: Refactor display string
>   perf c2c: Sort on peer snooping for load operations
>   perf c2c: Use 'peer' as default display for Arm64
>   perf c2c: Update documentation for new display option 'peer'
> 
>  tools/include/uapi/linux/perf_event.h         |   2 +-
>  tools/perf/Documentation/perf-c2c.txt         |  31 +-
>  tools/perf/builtin-c2c.c                      | 454 ++++++++++++++----
>  .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
>  tools/perf/util/arm-spe.c                     | 130 ++++-
>  tools/perf/util/mem-events.c                  |  46 +-
>  tools/perf/util/mem-events.h                  |   3 +
>  8 files changed, 547 insertions(+), 132 deletions(-)
> 
> -- 
> 2.34.1
Leo Yan Aug. 12, 2022, 1:26 a.m. UTC | #2
Hi Arnaldo,

On Thu, Aug 11, 2022 at 07:25:35PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 11, 2022 at 02:24:36PM +0800, Leo Yan escreveu:
> > Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> > us to detect cache line contention and transfers.
> > 
> > This patch set has been rebased on the acme/perf/core branch with the latest
> > commit b39c9e1b101d ("perf machine: Fix missing free of
> > machine->kallsyms_filename").
> > 
> > To make building success, a compilation fixing commit [1] has been sent
> > to LKML, this patch set is dependent on it.  This patch set has been verified
> > for both x86 perf memory events and Arm SPE events.
> > 
> > [1] https://lore.kernel.org/lkml/20220811044341.426796-1-leo.yan@linaro.org/
> 
> So, I tentatively applied this set after applying the patch for
> <asm/sysreg.h>, and its all now out in tmp.perf/core in my git tree,
> please check.

With discussing with Suzuki, he pointed it is not perfect for adding asm
include path in that way.  With the patch on tmp.perf/core branch, two
include paths are added into CFLAGS for arm-spe.c:

  -I$(srctree)/tools/arch/$(SRCARCH)/include/
  -I$(srctree)/tools/arch/arm64/include/

When we build perf on x86_64, then $(srctree)/tools/arch/x86/include/
takes more precedence than $(srctree)/tools/arch/arm64/include/; if we
want to include header file without relative path in c code, like
"#include <asm/cputype.h>", then it has chance to find the same name
file from x86's asm folder rather than arm64's asm folder.

At yesterday, I spent couple hours to find other methods (like
filter-out, CFLAGS_REMOVE, etc) in makefile but it's no lucky to make
success to give precedence for $(srctree)/tools/arch/arm64/include/.

So current patches on the branch tmp.perf/core can build successfully,
but if have any better method to resolve the header path precedence
issue, then I prefer to improve for this, which can allow us later
don't worry about it.  Any suggestion for this?

> I'm doing the usual set of container build tests, but any additional
> checking, including on the committer note I added to the first patch in
> this series, claryfing it is not really a "sync" with the kernel
> headers, is more than welcome.

It's fine for me for adding my Signed-off for the signature chain.
Appreicate for the amending.

Thanks,
Leo