diff mbox series

[v4,4/4] perf mem: Support HITM for when mem_lvl_num is any

Message ID 20220324183323.31414-5-alisaidi@amazon.com (mailing list archive)
State New, archived
Headers show
Series perf: arm-spe: Decode SPE source and use for perf c2c | expand

Commit Message

Ali Saidi March 24, 2022, 6:33 p.m. UTC
For loads that hit in a the LLC snoop filter and are fulfilled from a
higher level cache on arm64 Neoverse cores, it's not usually clear what
the true level of the cache the data came from (i.e. a transfer from a
core could come from it's L1 or L2). Instead of making an assumption of
where the line came from, add support for incrementing HITM if the
source is CACHE_ANY.

Since other architectures don't seem to populate the mem_lvl_num field
here there shouldn't be a change in functionality.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: German Gomez <german.gomez@arm.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
---
 tools/perf/util/mem-events.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Leo Yan March 26, 2022, 6:23 a.m. UTC | #1
On Thu, Mar 24, 2022 at 06:33:23PM +0000, Ali Saidi wrote:
> For loads that hit in a the LLC snoop filter and are fulfilled from a
> higher level cache on arm64 Neoverse cores, it's not usually clear what
> the true level of the cache the data came from (i.e. a transfer from a
> core could come from it's L1 or L2). Instead of making an assumption of
> where the line came from, add support for incrementing HITM if the
> source is CACHE_ANY.
> 
> Since other architectures don't seem to populate the mem_lvl_num field
> here there shouldn't be a change in functionality.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> Tested-by: German Gomez <german.gomez@arm.com>
> Reviewed-by: German Gomez <german.gomez@arm.com>
> ---
>  tools/perf/util/mem-events.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index e5e405185498..084977cfebef 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -539,6 +539,15 @@ do {				\
>  					stats->ld_llchit++;
>  			}
>  
> +			/*
> +			 * A hit in another cores cache must mean a llc snoop
> +			 * filter hit
> +			 */
> +			if (lnum == P(LVLNUM, ANY_CACHE)) {
> +				if (snoop & P(SNOOP, HITM))
> +					HITM_INC(lcl_hitm);
> +			}

This might break the memory profiling result for x86, see file
arch/x86/events/intel/ds.c:

  97 void __init intel_pmu_pebs_data_source_skl(bool pmem)
  98 {
  99         u64 pmem_or_l4 = pmem ? LEVEL(PMEM) : LEVEL(L4);
  ...
 105         pebs_data_source[0x0d] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOP, HITM);
 106 }

Which means that it's possible that it's a remote access and the cache
level is ANY_CACHE, it's good to add checking for bit
PERF_MEM_REMOTE_REMOTE:

	u64 remote = data_src->mem_remote;

	/*
	 * A hit in another cores cache must mean a llc snoop
	 * filter hit
	 */
	if (lnum == P(LVLNUM, ANY_CACHE) && remote != P(REMOTE, REMOTE)) {
	        if (snoop & P(SNOOP, HITM))
	                HITM_INC(lcl_hitm);
	}

Appreciate German's reviewing and testing, and sorry I jumped in very
late.

Thanks,
Leo

> +
>  			if (lvl & P(LVL, LOC_RAM) || lnum == P(LVLNUM, RAM)) {
>  				stats->lcl_dram++;
>  				if (snoop & P(SNOOP, HIT))
> -- 
> 2.32.0
>
Arnaldo Carvalho de Melo March 26, 2022, 1:30 p.m. UTC | #2
Em Sat, Mar 26, 2022 at 02:23:03PM +0800, Leo Yan escreveu:
> On Thu, Mar 24, 2022 at 06:33:23PM +0000, Ali Saidi wrote:
> > For loads that hit in a the LLC snoop filter and are fulfilled from a
> > higher level cache on arm64 Neoverse cores, it's not usually clear what
> > the true level of the cache the data came from (i.e. a transfer from a
> > core could come from it's L1 or L2). Instead of making an assumption of
> > where the line came from, add support for incrementing HITM if the
> > source is CACHE_ANY.
> > 
> > Since other architectures don't seem to populate the mem_lvl_num field
> > here there shouldn't be a change in functionality.
> > 
> > Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> > Tested-by: German Gomez <german.gomez@arm.com>
> > Reviewed-by: German Gomez <german.gomez@arm.com>
> > ---
> >  tools/perf/util/mem-events.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> > index e5e405185498..084977cfebef 100644
> > --- a/tools/perf/util/mem-events.c
> > +++ b/tools/perf/util/mem-events.c
> > @@ -539,6 +539,15 @@ do {				\
> >  					stats->ld_llchit++;
> >  			}
> >  
> > +			/*
> > +			 * A hit in another cores cache must mean a llc snoop
> > +			 * filter hit
> > +			 */
> > +			if (lnum == P(LVLNUM, ANY_CACHE)) {
> > +				if (snoop & P(SNOOP, HITM))
> > +					HITM_INC(lcl_hitm);
> > +			}
> 
> This might break the memory profiling result for x86, see file
> arch/x86/events/intel/ds.c:
> 
>   97 void __init intel_pmu_pebs_data_source_skl(bool pmem)
>   98 {
>   99         u64 pmem_or_l4 = pmem ? LEVEL(PMEM) : LEVEL(L4);
>   ...
>  105         pebs_data_source[0x0d] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOP, HITM);
>  106 }
> 
> Which means that it's possible that it's a remote access and the cache
> level is ANY_CACHE, it's good to add checking for bit
> PERF_MEM_REMOTE_REMOTE:
> 
> 	u64 remote = data_src->mem_remote;
> 
> 	/*
> 	 * A hit in another cores cache must mean a llc snoop
> 	 * filter hit
> 	 */
> 	if (lnum == P(LVLNUM, ANY_CACHE) && remote != P(REMOTE, REMOTE)) {
> 	        if (snoop & P(SNOOP, HITM))
> 	                HITM_INC(lcl_hitm);
> 	}
> 
> Appreciate German's reviewing and testing, and sorry I jumped in very
> late.

I have not published this on perf/core, its just in tmp.perf/core while
tests ran, so I'll remove this specific patch and rerun tests, thanks
for reviewing.

- Arnaldo
Ali Saidi March 26, 2022, 7:14 p.m. UTC | #3
On Sat, 26 Mar 2022 22:23:03 +0000, Leo Yan wrote:
> On Thu, Mar 24, 2022 at 06:33:23PM +0000, Ali Saidi wrote:
> > For loads that hit in a the LLC snoop filter and are fulfilled from a
> > higher level cache on arm64 Neoverse cores, it's not usually clear what
> > the true level of the cache the data came from (i.e. a transfer from a
> > core could come from it's L1 or L2). Instead of making an assumption of
> > where the line came from, add support for incrementing HITM if the
> > source is CACHE_ANY.A
[snip]
> 
> This might break the memory profiling result for x86, see file
> arch/x86/events/intel/ds.c:
> 
>   97 void __init intel_pmu_pebs_data_source_skl(bool pmem)
>   98 {
>   99         u64 pmem_or_l4 = pmem ? LEVEL(PMEM) : LEVEL(L4);
>   ...
>  105         pebs_data_source[0x0d] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOP, HITM);
>  106 }
> 

Thanks for catching this Leo, I'll add your fix.

Ali
diff mbox series

Patch

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index e5e405185498..084977cfebef 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -539,6 +539,15 @@  do {				\
 					stats->ld_llchit++;
 			}
 
+			/*
+			 * A hit in another cores cache must mean a llc snoop
+			 * filter hit
+			 */
+			if (lnum == P(LVLNUM, ANY_CACHE)) {
+				if (snoop & P(SNOOP, HITM))
+					HITM_INC(lcl_hitm);
+			}
+
 			if (lvl & P(LVL, LOC_RAM) || lnum == P(LVLNUM, RAM)) {
 				stats->lcl_dram++;
 				if (snoop & P(SNOOP, HIT))