[21/24] selftests/resctrl: Get resource id from cache id

Message ID	20231024092634.7122-22-ilpo.jarvinen@linux.intel.com (mailing list archive)
State	New
Headers	show Return-Path: <linux-kselftest-owner@vger.kernel.org> From: =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com> To: linux-kselftest@vger.kernel.org, Reinette Chatre <reinette.chatre@intel.com>, Shuah Khan <shuah@kernel.org>, Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>, =?utf-8?q?Maciej_Wiecz=C3=B3r-R?= =?utf-8?q?etman?= <maciej.wieczor-retman@intel.com>, Fenghua Yu <fenghua.yu@intel.com> Cc: linux-kernel@vger.kernel.org, =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com> Subject: [PATCH 21/24] selftests/resctrl: Get resource id from cache id Date: Tue, 24 Oct 2023 12:26:31 +0300 Message-Id: <20231024092634.7122-22-ilpo.jarvinen@linux.intel.com> In-Reply-To: <20231024092634.7122-1-ilpo.jarvinen@linux.intel.com> References: <20231024092634.7122-1-ilpo.jarvinen@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	selftests/resctrl: CAT test improvements & generalized test framework \| expand [00/24] selftests/resctrl: CAT test improvements & generalized test framework [01/24] selftests/resctrl: Split fill_buf to allow tests finer-grained control [02/24] selftests/resctrl: Refactor fill_buf functions [03/24] selftests/resctrl: Refactor get_cbm_mask() [04/24] selftests/resctrl: Mark get_cache_size() cache_type const [05/24] selftests/resctrl: Create cache_size() helper [06/24] selftests/resctrl: Exclude shareable bits from schemata in CAT test [07/24] selftests/resctrl: Split measure_cache_vals() function [08/24] selftests/resctrl: Split show_cache_info() to test specific and generic parts [09/24] selftests/resctrl: Remove unnecessary __u64 -> unsigned long conversion [10/24] selftests/resctrl: Remove nested calls in perf event handling [11/24] selftests/resctrl: Consolidate naming of perf event related things [12/24] selftests/resctrl: Improve perf init [13/24] selftests/resctrl: Convert perf related globals to locals [14/24] selftests/resctrl: Move cat_val() to cat_test.c and rename to cat_test() [15/24] selftests/resctrl: Read in less obvious order to defeat prefetch optimizations [16/24] selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test [17/24] selftests/resctrl: Create struct for input parameter [18/24] selftests/resctrl: Introduce generalized test framework [19/24] selftests/resctrl: Pass write_schemata() resource instead of test name [20/24] selftests/resctrl: Add helper to convert L2/3 to integer [21/24] selftests/resctrl: Get resource id from cache id [22/24] selftests/resctrl: Add test groups and name L3 CAT test L3_CAT [23/24] selftests/resctrl: Add L2 CAT test [24/24] selftests/resctrl: Ignore failures from L2 CAT test with <= 2 bits

Message ID

20231024092634.7122-22-ilpo.jarvinen@linux.intel.com (mailing list archive)

State

New

Headers

From: =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com>
To: linux-kselftest@vger.kernel.org,
 Reinette Chatre <reinette.chatre@intel.com>, Shuah Khan <shuah@kernel.org>,
 Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>, =?utf-8?q?Maciej_Wiecz=C3=B3r-R?=
	=?utf-8?q?etman?=  <maciej.wieczor-retman@intel.com>,
 Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-kernel@vger.kernel.org,
 =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com>
Subject: [PATCH 21/24] selftests/resctrl: Get resource id from cache id
Date: Tue, 24 Oct 2023 12:26:31 +0300
Message-Id: <20231024092634.7122-22-ilpo.jarvinen@linux.intel.com>
In-Reply-To: <20231024092634.7122-1-ilpo.jarvinen@linux.intel.com>
References: <20231024092634.7122-1-ilpo.jarvinen@linux.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

selftests/resctrl: CAT test improvements & generalized test framework | expand

Commit Message

Ilpo Järvinen Oct. 24, 2023, 9:26 a.m. UTC

Resource id is acquired differently depending on CPU. AMD tests use id
from L3 cache, whereas CPUs from other vendors base the id on topology
package id. In order to support L2 CAT test, this has to be
generalized.

The driver side code seems to get the resource ids from cache ids so
the approach used by the AMD branch seems to match the kernel-side
code. It will also work with L2 resource IDs as long as the cache level
is generalized.

Using the topology id was always fragile due to mismatch with the
kernel-side way to acquire the resource id. It got incorrect resource
id, e.g., when Cluster-on-Die (CoD) is enabled for CPU (but CoD is not
well suited for resctrl in the first place so it has not been a big
issues if test don't work correctly with it).

Taking all the above into account, generalize the resource id
calculation by taking it from the cache id and do not hard-code the
cache level.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 tools/testing/selftests/resctrl/resctrl.h     |  2 +-
 tools/testing/selftests/resctrl/resctrl_val.c |  4 +--
 tools/testing/selftests/resctrl/resctrlfs.c   | 31 ++++++++++++-------
 3 files changed, 23 insertions(+), 14 deletions(-)

Comments

Maciej Wieczor-Retman Oct. 31, 2023, 7:58 a.m. UTC | #1

On 2023-10-27 at 16:30:38 +0300, Ilpo Järvinen wrote:
>On Fri, 27 Oct 2023, Maciej Wieczór-Retman wrote:
>
>> On 2023-10-24 at 12:26:31 +0300, Ilpo Järvinen wrote:
>> >Resource id is acquired differently depending on CPU. AMD tests use id
>> >from L3 cache, whereas CPUs from other vendors base the id on topology
>> >package id. In order to support L2 CAT test, this has to be
>> >generalized.
>> >
>> >The driver side code seems to get the resource ids from cache ids so
>> >the approach used by the AMD branch seems to match the kernel-side
>> >code. It will also work with L2 resource IDs as long as the cache level
>> >is generalized.
>> >
>> >Using the topology id was always fragile due to mismatch with the
>> >kernel-side way to acquire the resource id. It got incorrect resource
>> >id, e.g., when Cluster-on-Die (CoD) is enabled for CPU (but CoD is not
>> >well suited for resctrl in the first place so it has not been a big
>> >issues if test don't work correctly with it).
>> 
>> "not been a big issues" -> "not been a big issue"?
>> 
>> "if test don't work correctly" -> "if test doesn't work correctly" / "if tests
>> don't work correctly"?
>> 
>> >
>> >Taking all the above into account, generalize the resource id
>> >calculation by taking it from the cache id and do not hard-code the
>> >cache level.
>> 
>> In x86/resctrl files group of CPUs sharing a resctrl resource is called a
>> domain. Because of that I think "resource id" terms should be substituted with
>> "domain id" to match core resctrl code.
>
>Okay, I was just using the terminology which pre-existed in the selftest 
>code.
>
>I've really tried to look how I should call it but things were quite 
>inconsistent. The selftest uses resource id and reads it from cache id.
>Documentation uses "instances of that resource" or "cache id" or "domain 
>x".
>
>
>Documentation/arch/x86/resctrl.rst is very misleading btw and should be 
>IMHO updated:
>
>Memory bandwidth Allocation (default mode)
>------------------------------------------
>
>Memory b/w domain is L3 cache.
>::
>
>        MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
>
>It claims "domain" is L3 cache (and the value we're talking about is 
>called "cache_id" here which is what you just called "domain id"). I 
>suppose it should say "b/w resource is L3 cache"?
>

I believe so. After some looking around it looks like the "L3 domain" naming is
also present in rdtgroup.c:mkdir_mondata_all() and "mon_data" description in the
docs:

Docs:
	"mon_data":
		This contains a set of files organized by L3 domain and by
		RDT event. E.g. on a system with two L3 domains there will

rdtgroup.c:
	/*
	 * This creates a directory mon_data which contains the monitored data.
	 *
	 * mon_data has one directory for each domain which are named
	 * in the format mon_<domain_name>_<domain_id>. For ex: A mon_data
	 * with L3 domain looks as below:
	 * ./mon_data:
	 * mon_L3_00
	 * mon_L3_01
	 * mon_L3_02
	 * ...
	 *
	 * Each domain directory has one file per event:
	 * ./mon_L3_00/:
	 * llc_occupancy
	 *
	 */
	static int mkdir_mondata_all(struct kernfs_node *parent_kn,
				     struct rdtgroup *prgrp,
				     struct kernfs_node **dest_kn)

From the reading I've done on resctrl my understanding was that L2/L3 are
resources that can be shared between domains.

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index 99fbce4794bc..c148998bf6ad 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -121,7 +121,7 @@  extern char llc_occup_path[1024];
 int get_vendor(void);
 bool check_resctrlfs_support(void);
 int filter_dmesg(void);
-int get_resource_id(int cpu_no, int *resource_id);
+int get_resource_id(const char *resource, int cpu_no, int *resource_id);
 int mount_resctrlfs(void);
 int umount_resctrlfs(void);
 int validate_bw_report_request(char *bw_report);
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
index 43ca026c6e0f..d210daafc5af 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -415,7 +415,7 @@  static void initialize_mem_bw_resctrl(const char *ctrlgrp, const char *mongrp,
 {
 	int resource_id;
 
-	if (get_resource_id(cpu_no, &resource_id) < 0) {
+	if (get_resource_id("MB", cpu_no, &resource_id) < 0) {
 		perror("Could not get resource_id");
 		return;
 	}
@@ -584,7 +584,7 @@  static void initialize_llc_occu_resctrl(const char *ctrlgrp, const char *mongrp,
 {
 	int resource_id;
 
-	if (get_resource_id(cpu_no, &resource_id) < 0) {
+	if (get_resource_id("L3", cpu_no, &resource_id) < 0) {
 		perror("# Unable to resource_id");
 		return;
 	}
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
index b98d88f7cbc4..8e79b646d7cb 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -111,24 +111,33 @@  static int get_cache_level(const char *cache_type)
 	return -1;
 }
 
+static int get_resource_cache_level(const char *resource)
+{
+	/* "MB" use L3 (LLC) resource id */
+	if (!strcmp(resource, "MB"))
+		return 3;
+	return get_cache_level(resource);
+}
+
 /*
- * get_resource_id - Get socket number/l3 id for a specified CPU
- * @cpu_no:	CPU number
- * @resource_id: Socket number or l3_id
+ * get_resource_id - Get resctrl resource id for a specified CPU
+ * @resource:		resource
+ * @cpu_no:		CPU number
+ * @resource_id:	resource id (cache id; for MB, L3 cache id)
  *
  * Return: >= 0 on success, < 0 on failure.
  */
-int get_resource_id(int cpu_no, int *resource_id)
+int get_resource_id(const char *resource, int cpu_no, int *resource_id)
 {
 	char phys_pkg_path[1024];
+	int cache_num;
 	FILE *fp;
 
-	if (get_vendor() == ARCH_AMD)
-		sprintf(phys_pkg_path, "%s%d/cache/index3/id",
-			PHYS_ID_PATH, cpu_no);
-	else
-		sprintf(phys_pkg_path, "%s%d/topology/physical_package_id",
-			PHYS_ID_PATH, cpu_no);
+	cache_num = get_resource_cache_level(resource);
+	if (cache_num < 0)
+		return cache_num;
+
+	sprintf(phys_pkg_path, "%s%d/cache/index%d/id", PHYS_ID_PATH, cpu_no, cache_num);
 
 	fp = fopen(phys_pkg_path, "r");
 	if (!fp) {
@@ -526,7 +535,7 @@  int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, const char *resour
 		return -1;
 	}
 
-	if (get_resource_id(cpu_no, &resource_id) < 0) {
+	if (get_resource_id(resource, cpu_no, &resource_id) < 0) {
 		sprintf(reason, "Failed to get resource id");
 		ret = -1;

[21/24] selftests/resctrl: Get resource id from cache id

Commit Message

Comments

Patch