From patchwork Wed Apr 12 15:37:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 13209308 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCE37C77B6E for ; Wed, 12 Apr 2023 15:39:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231577AbjDLPja (ORCPT ); Wed, 12 Apr 2023 11:39:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231556AbjDLPj2 (ORCPT ); Wed, 12 Apr 2023 11:39:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B95ED6584 for ; Wed, 12 Apr 2023 08:38:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681313913; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=lVXR3D7hXYBtVLp/ThxuWMiiNyD2dWfyFQSjCn5jxS8=; b=WodvM877ZgIaTjlDBCf/ruxd4HyyYrryj0qU1UV8/N7cQlwda6e5KONpCBgLODNC99oQqx PLEbRXaNIvRPbTpIDqKIdZqooL0AJRlLmFj6or6AeYBU1rQPaslkOTq63kSGHHa1fsy054 lW5NuC/WJV1v110PtiiLnm9f+wToExA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-665-lNKMYUmXMAimN2RXA32hlQ-1; Wed, 12 Apr 2023 11:38:29 -0400 X-MC-Unique: lNKMYUmXMAimN2RXA32hlQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 73F181C09516; Wed, 12 Apr 2023 15:38:28 +0000 (UTC) Received: from llong.com (unknown [10.22.32.168]) by smtp.corp.redhat.com (Postfix) with ESMTP id E0B4C40C6E20; Wed, 12 Apr 2023 15:38:27 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Waiman Long Subject: [RFC PATCH 1/5] cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE handling Date: Wed, 12 Apr 2023 11:37:54 -0400 Message-Id: <20230412153758.3088111-2-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Extract out the setting of CS_CPU_EXCLUSIVE and CS_SCHED_LOAD_BALANCE flags as well as the rebuilding of scheduling domains into the new update_partition_exclusive() and update_partition_sd_lb() helper functions to simplify the logic. The update_partition_exclusive() helper is called mainly at the beginning of the caller, but it may be called at the end too. The update_partition_sd_lb() helper is called at the end of the caller. This patch should reduce the chance that cpuset partition will end up in an incorrect state. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 124 ++++++++++++++++++++++++----------------- 1 file changed, 72 insertions(+), 52 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 937ef4d60cd4..83a7193e0f2c 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1252,7 +1252,7 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) static void compute_effective_cpumask(struct cpumask *new_cpus, struct cpuset *cs, struct cpuset *parent) { - if (parent->nr_subparts_cpus) { + if (parent->nr_subparts_cpus && is_partition_valid(cs)) { cpumask_or(new_cpus, parent->effective_cpus, parent->subparts_cpus); cpumask_and(new_cpus, new_cpus, cs->cpus_allowed); @@ -1274,6 +1274,43 @@ enum subparts_cmd { static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, int turning_on); + +/* + * Update partition exclusive flag + * + * Return: 0 if successful, an error code otherwise + */ +static int update_partition_exclusive(struct cpuset *cs, int new_prs) +{ + bool exclusive = (new_prs > 0); + + if (exclusive && !is_cpu_exclusive(cs)) { + if (update_flag(CS_CPU_EXCLUSIVE, cs, 1)) + return PERR_NOTEXCL; + } else if (!exclusive && is_cpu_exclusive(cs)) { + /* Turning off CS_CPU_EXCLUSIVE will not return error */ + update_flag(CS_CPU_EXCLUSIVE, cs, 0); + } + return 0; +} + +/* + * Update partition load balance flag and/or rebuild sched domain + * + * Changing load balance flag will automatically call + * rebuild_sched_domains_locked(). + */ +static void update_partition_sd_lb(struct cpuset *cs, int old_prs) +{ + int new_prs = cs->partition_root_state; + bool new_lb = (new_prs != PRS_ISOLATED); + + if (new_lb != !!is_sched_load_balance(cs)) + update_flag(CS_SCHED_LOAD_BALANCE, cs, new_lb); + else if ((new_prs > 0) || (old_prs > 0)) + rebuild_sched_domains_locked(); +} + /** * update_parent_subparts_cpumask - update subparts_cpus mask of parent cpuset * @cs: The cpuset that requests change in partition root state @@ -1477,14 +1514,13 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, /* * Transitioning between invalid to valid or vice versa may require - * changing CS_CPU_EXCLUSIVE and CS_SCHED_LOAD_BALANCE. + * changing CS_CPU_EXCLUSIVE. */ if (old_prs != new_prs) { - if (is_prs_invalid(old_prs) && !is_cpu_exclusive(cs) && - (update_flag(CS_CPU_EXCLUSIVE, cs, 1) < 0)) - return PERR_NOTEXCL; - if (is_prs_invalid(new_prs) && is_cpu_exclusive(cs)) - update_flag(CS_CPU_EXCLUSIVE, cs, 0); + int err = update_partition_exclusive(cs, new_prs); + + if (err) + return err; } /* @@ -1521,15 +1557,16 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, update_tasks_cpumask(parent, tmp->addmask); /* - * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary. - * rebuild_sched_domains_locked() may be called. + * For partcmd_update without newmask, it is being called from + * cpuset_hotplug_workfn() where cpus_read_lock() wasn't taken. + * Update the load balance flag and scheduling domain if + * cpus_read_trylock() is successful. */ - if (old_prs != new_prs) { - if (old_prs == PRS_ISOLATED) - update_flag(CS_SCHED_LOAD_BALANCE, cs, 1); - else if (new_prs == PRS_ISOLATED) - update_flag(CS_SCHED_LOAD_BALANCE, cs, 0); + if ((cmd == partcmd_update) && !newmask && cpus_read_trylock()) { + update_partition_sd_lb(cs, old_prs); + cpus_read_unlock(); } + notify_partition_change(cs, old_prs); return 0; } @@ -1744,6 +1781,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, int retval; struct tmpmasks tmp; bool invalidate = false; + int old_prs = cs->partition_root_state; /* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */ if (cs == &top_cpuset) @@ -1863,6 +1901,9 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, */ if (parent->child_ecpus_count) update_sibling_cpumasks(parent, cs, &tmp); + + /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */ + update_partition_sd_lb(cs, old_prs); } return 0; } @@ -2239,7 +2280,6 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, static int update_prstate(struct cpuset *cs, int new_prs) { int err = PERR_NONE, old_prs = cs->partition_root_state; - bool sched_domain_rebuilt = false; struct cpuset *parent = parent_cs(cs); struct tmpmasks tmpmask; @@ -2258,45 +2298,28 @@ static int update_prstate(struct cpuset *cs, int new_prs) if (alloc_cpumasks(NULL, &tmpmask)) return -ENOMEM; + err = update_partition_exclusive(cs, new_prs); + if (err) + goto out; + if (!old_prs) { /* - * Turning on partition root requires setting the - * CS_CPU_EXCLUSIVE bit implicitly as well and cpus_allowed - * cannot be empty. + * cpus_allowed cannot be empty. */ if (cpumask_empty(cs->cpus_allowed)) { err = PERR_CPUSEMPTY; goto out; } - err = update_flag(CS_CPU_EXCLUSIVE, cs, 1); - if (err) { - err = PERR_NOTEXCL; - goto out; - } - err = update_parent_subparts_cpumask(cs, partcmd_enable, NULL, &tmpmask); - if (err) { - update_flag(CS_CPU_EXCLUSIVE, cs, 0); + if (err) goto out; - } - - if (new_prs == PRS_ISOLATED) { - /* - * Disable the load balance flag should not return an - * error unless the system is running out of memory. - */ - update_flag(CS_SCHED_LOAD_BALANCE, cs, 0); - sched_domain_rebuilt = true; - } } else if (old_prs && new_prs) { /* * A change in load balance state only, no change in cpumasks. */ - update_flag(CS_SCHED_LOAD_BALANCE, cs, (new_prs != PRS_ISOLATED)); - sched_domain_rebuilt = true; - goto out; /* Sched domain is rebuilt in update_flag() */ + goto out; } else { /* * Switching back to member is always allowed even if it @@ -2315,15 +2338,6 @@ static int update_prstate(struct cpuset *cs, int new_prs) compute_effective_cpumask(cs->effective_cpus, cs, parent); spin_unlock_irq(&callback_lock); } - - /* Turning off CS_CPU_EXCLUSIVE will not return error */ - update_flag(CS_CPU_EXCLUSIVE, cs, 0); - - if (!is_sched_load_balance(cs)) { - /* Make sure load balance is on */ - update_flag(CS_SCHED_LOAD_BALANCE, cs, 1); - sched_domain_rebuilt = true; - } } update_tasks_cpumask(parent, tmpmask.new_cpus); @@ -2331,18 +2345,24 @@ static int update_prstate(struct cpuset *cs, int new_prs) if (parent->child_ecpus_count) update_sibling_cpumasks(parent, cs, &tmpmask); - if (!sched_domain_rebuilt) - rebuild_sched_domains_locked(); out: /* - * Make partition invalid if an error happen + * Make partition invalid & disable CS_CPU_EXCLUSIVE if an error + * happens. */ - if (err) + if (err) { new_prs = -new_prs; + update_partition_exclusive(cs, new_prs); + } + spin_lock_irq(&callback_lock); cs->partition_root_state = new_prs; WRITE_ONCE(cs->prs_err, err); spin_unlock_irq(&callback_lock); + + /* Update sched domains and load balance flag */ + update_partition_sd_lb(cs, old_prs); + /* * Update child cpusets, if present. * Force update if switching back to member. From patchwork Wed Apr 12 15:37:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 13209306 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9FFAC77B6E for ; Wed, 12 Apr 2023 15:39:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231489AbjDLPj1 (ORCPT ); Wed, 12 Apr 2023 11:39:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229778AbjDLPj0 (ORCPT ); Wed, 12 Apr 2023 11:39:26 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA7C9E5A for ; Wed, 12 Apr 2023 08:38:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681313911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Usy+tTMXVgyLNSQ+gvGQG+KVUDCWEo3fDi/oBZlnoOo=; b=D/Qgy/cdmAI9TvdWBFjBugHxI6S0syJ8YiAg8FSUy42k814UOCK5VAHWAOysK2+KoD42rv tl/XGiJ9dHloWZj/S6f6O1ro4FiYYU+AyBEvh+Xe/7EdLHHH4VCt+XKm9hzDptWhKZmnCz 8HaYYvuzsOtyZeNQaJ3ApT3BoLTee7c= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-522-h39wjL-qPGi-kCoZX6R5GQ-1; Wed, 12 Apr 2023 11:38:29 -0400 X-MC-Unique: h39wjL-qPGi-kCoZX6R5GQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 13559100F7E1; Wed, 12 Apr 2023 15:38:29 +0000 (UTC) Received: from llong.com (unknown [10.22.32.168]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8289640C6E20; Wed, 12 Apr 2023 15:38:28 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Waiman Long Subject: [RFC PATCH 2/5] cgroup/cpuset: Add a new "isolcpus" paritition root state Date: Wed, 12 Apr 2023 11:37:55 -0400 Message-Id: <20230412153758.3088111-3-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org One can use "cpuset.cpus.partition" to create multiple scheduling domains or to produce a set of isolated CPUs where load balancing is disabled. The former use case is less common but the latter one can be frequently used especially for the Telco use cases like DPDK. The existing "isolated" partition can be used to produce isolated CPUs if the applications have full control of a system. However, in a containerized environment where all the apps are run in a container, it is hard to distribute out isolated CPUs from the root down given the unified hierarchy nature of cgroup v2. The container running on isolated CPUs can be several layers down from the root. The current partition feature requires that all the ancestors of a leaf partition root must be parititon roots themselves. This can be hard to manage. This patch introduces a new special partition root state called "isolcpus" that serves as a pool of isolated CPUs to be pulled into other "isolated" partitions. At most one instance of the "isolcpus" partition is allowed in a system preferrably as a child of the top cpuset. In a valid "isolcpus" partition, "cpuset.cpus" contains the set of isolated CPUs and "cpuset.cpus.effective" contains the set of freely available isolated CPUs that have not yet been pulled into other "isolated" cpusets. The special "isolcpus" partition cannot have normal cpuset children. So we are not allowed to enable child cpuset in its "cgroup.subtree_control" file if it has children. Tasks are also not allowed in the "cgroup.procs" of the "isolcpus" partition. Unlike other partition roots, empty "cpuset.cpus" is allowed in the "isolcpus" partition as this special cpuset is not designed to hold tasks. The CPUs in the "isolcpus" partition are not exclusive so that those isolated CPUs can be distributed down sibling hierarchies as usual even though they will not show up in their "cpuset.cpus.effective". Right now, an "isolcpus" partition only disable load balancing of the isolated CPUs. In the near future, it may be extended to support additional isolation attributes like those currently supported by the "isolcpus" or related kernel boot command line options. In a subsequent patch, a privileged user can change a "member" cpuset to an "isolated" partition root by pulling isolated CPUs from the "isolcpus" partition if its parent is not a partition root that can directly satisfy the request. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 158 ++++++++++++++++++++++++++++++++++------- 1 file changed, 133 insertions(+), 25 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 83a7193e0f2c..444eae3a9a6b 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -98,6 +98,9 @@ enum prs_errcode { PERR_NOCPUS, PERR_HOTPLUG, PERR_CPUSEMPTY, + PERR_ISOLCPUS, + PERR_ISOLTASK, + PERR_ISOLCHILD, }; static const char * const perr_strings[] = { @@ -108,6 +111,9 @@ static const char * const perr_strings[] = { [PERR_NOCPUS] = "Parent unable to distribute cpu downstream", [PERR_HOTPLUG] = "No cpu available due to hotplug", [PERR_CPUSEMPTY] = "cpuset.cpus is empty", + [PERR_ISOLCPUS] = "An isolcpus partition is already present", + [PERR_ISOLTASK] = "Isolcpus partition can't have tasks", + [PERR_ISOLCHILD] = "Isolcpus partition can't have children", }; struct cpuset { @@ -198,6 +204,9 @@ struct cpuset { /* Handle for cpuset.cpus.partition */ struct cgroup_file partition_file; + + /* siblings list anchored at isol_children */ + struct list_head isol_sibling; }; /* @@ -206,14 +215,26 @@ struct cpuset { * 0 - member (not a partition root) * 1 - partition root * 2 - partition root without load balancing (isolated) + * 3 - isolated cpu pool (isolcpus) * -1 - invalid partition root * -2 - invalid isolated partition root + * -3 - invalid isolated cpu pool + * + * An isolated cpu pool is a special isolated partition root. At most one + * instance of it is allowed in a system. It provides a pool of isolated + * cpus that a normal isolated partition root can pull from, if privileged, + * in case its parent cannot fulfill its request. */ #define PRS_MEMBER 0 #define PRS_ROOT 1 #define PRS_ISOLATED 2 +#define PRS_ISOLCPUS 3 #define PRS_INVALID_ROOT -1 #define PRS_INVALID_ISOLATED -2 +#define PRS_INVALID_ISOLCPUS -3 + +static struct cpuset *isolcpus_cs; /* System isolcpus partition root */ +static struct list_head isol_children; /* Children that pull isolated cpus */ static inline bool is_prs_invalid(int prs_state) { @@ -335,6 +356,7 @@ static struct cpuset top_cpuset = { .flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) | (1 << CS_MEM_EXCLUSIVE)), .partition_root_state = PRS_ROOT, + .isol_sibling = LIST_HEAD_INIT(top_cpuset.isol_sibling), }; /** @@ -1282,7 +1304,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, */ static int update_partition_exclusive(struct cpuset *cs, int new_prs) { - bool exclusive = (new_prs > 0); + bool exclusive = (new_prs == PRS_ROOT) || (new_prs == PRS_ISOLATED); if (exclusive && !is_cpu_exclusive(cs)) { if (update_flag(CS_CPU_EXCLUSIVE, cs, 1)) @@ -1303,7 +1325,7 @@ static int update_partition_exclusive(struct cpuset *cs, int new_prs) static void update_partition_sd_lb(struct cpuset *cs, int old_prs) { int new_prs = cs->partition_root_state; - bool new_lb = (new_prs != PRS_ISOLATED); + bool new_lb = (new_prs != PRS_ISOLATED) && (new_prs != PRS_ISOLCPUS); if (new_lb != !!is_sched_load_balance(cs)) update_flag(CS_SCHED_LOAD_BALANCE, cs, new_lb); @@ -1360,18 +1382,20 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, int part_error = PERR_NONE; /* Partition error? */ percpu_rwsem_assert_held(&cpuset_rwsem); + old_prs = new_prs = cs->partition_root_state; /* * The parent must be a partition root. * The new cpumask, if present, or the current cpus_allowed must - * not be empty. + * not be empty except for isolcpus partition. */ if (!is_partition_valid(parent)) { return is_partition_invalid(parent) ? PERR_INVPARENT : PERR_NOTPART; } - if ((newmask && cpumask_empty(newmask)) || - (!newmask && cpumask_empty(cs->cpus_allowed))) + if ((new_prs != PRS_ISOLCPUS) && + ((newmask && cpumask_empty(newmask)) || + (!newmask && cpumask_empty(cs->cpus_allowed)))) return PERR_CPUSEMPTY; /* @@ -1379,7 +1403,6 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, * partcmd_invalidate commands. */ adding = deleting = false; - old_prs = new_prs = cs->partition_root_state; if (cmd == partcmd_enable) { /* * Enabling partition root is not allowed if cpus_allowed @@ -1498,11 +1521,13 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, switch (cs->partition_root_state) { case PRS_ROOT: case PRS_ISOLATED: + case PRS_ISOLCPUS: if (part_error) new_prs = -old_prs; break; case PRS_INVALID_ROOT: case PRS_INVALID_ISOLATED: + case PRS_INVALID_ISOLCPUS: if (!part_error) new_prs = -old_prs; break; @@ -1553,6 +1578,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, spin_unlock_irq(&callback_lock); + if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS)) + isolcpus_cs = NULL; + if (adding || deleting) update_tasks_cpumask(parent, tmp->addmask); @@ -1640,7 +1668,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, */ old_prs = new_prs = cp->partition_root_state; if ((cp != cs) && old_prs) { - switch (parent->partition_root_state) { + int parent_prs = parent->partition_root_state; + + /* + * isolcpus partition parent can't have children + */ + WARN_ON_ONCE(parent_prs == PRS_ISOLCPUS); + + switch (parent_prs) { case PRS_ROOT: case PRS_ISOLATED: update_parent = true; @@ -1735,9 +1770,10 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, * @parent: Parent cpuset * @cs: Current cpuset * @tmp: Temp variables + * @force: Force update if set */ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs, - struct tmpmasks *tmp) + struct tmpmasks *tmp, bool force) { struct cpuset *sibling; struct cgroup_subsys_state *pos_css; @@ -1756,7 +1792,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs, cpuset_for_each_child(sibling, pos_css, parent) { if (sibling == cs) continue; - if (!sibling->use_parent_ecpus) + if (!sibling->use_parent_ecpus && !force) continue; if (!css_tryget_online(&sibling->css)) continue; @@ -1893,14 +1929,16 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, update_cpumasks_hier(cs, &tmp, false); if (cs->partition_root_state) { + bool force = (cs->partition_root_state == PRS_ISOLCPUS); struct cpuset *parent = parent_cs(cs); /* * For partition root, update the cpumasks of sibling - * cpusets if they use parent's effective_cpus. + * cpusets if they use parent's effective_cpus or when + * the current cpuset is an isolcpus partition. */ - if (parent->child_ecpus_count) - update_sibling_cpumasks(parent, cs, &tmp); + if (parent->child_ecpus_count || force) + update_sibling_cpumasks(parent, cs, &tmp, force); /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */ update_partition_sd_lb(cs, old_prs); @@ -2298,6 +2336,41 @@ static int update_prstate(struct cpuset *cs, int new_prs) if (alloc_cpumasks(NULL, &tmpmask)) return -ENOMEM; + /* + * Only one isolcpus partition is allowed and it can't have children + * or tasks in it. The isolcpus partition is also not exclusive so + * that the isolated but unused cpus can be distributed down the + * hierarchy. + */ + if (new_prs == PRS_ISOLCPUS) { + if (isolcpus_cs) + err = PERR_ISOLCPUS; + else if (!list_empty(&cs->css.children)) + err = PERR_ISOLCHILD; + else if (cs->css.cgroup->nr_populated_csets) + err = PERR_ISOLTASK; + + if (err && old_prs) { + /* + * A previous valid partition root is now invalid + */ + goto disable_partition; + } else if (err) { + goto out; + } + + /* + * Unlike other partition types, an isolated cpu pool can + * be empty as it is essentially a place holder for isolated + * CPUs. + */ + if (!old_prs && cpumask_empty(cs->cpus_allowed)) { + /* Force effective_cpus to be empty too */ + cpumask_clear(cs->effective_cpus); + goto out; + } + } + err = update_partition_exclusive(cs, new_prs); if (err) goto out; @@ -2316,11 +2389,9 @@ static int update_prstate(struct cpuset *cs, int new_prs) if (err) goto out; } else if (old_prs && new_prs) { - /* - * A change in load balance state only, no change in cpumasks. - */ - goto out; + goto out; /* Skip cpuset and sibling task update */ } else { +disable_partition: /* * Switching back to member is always allowed even if it * disables child partitions. @@ -2342,8 +2413,13 @@ static int update_prstate(struct cpuset *cs, int new_prs) update_tasks_cpumask(parent, tmpmask.new_cpus); - if (parent->child_ecpus_count) - update_sibling_cpumasks(parent, cs, &tmpmask); + /* + * Since isolcpus partition is not exclusive, we have to update + * sibling hierarchies as well. + */ + if ((new_prs == PRS_ISOLCPUS) || parent->child_ecpus_count) + update_sibling_cpumasks(parent, cs, &tmpmask, + new_prs == PRS_ISOLCPUS); out: /* @@ -2363,6 +2439,14 @@ static int update_prstate(struct cpuset *cs, int new_prs) /* Update sched domains and load balance flag */ update_partition_sd_lb(cs, old_prs); + /* + * Check isolcpus_cs state + */ + if (new_prs == PRS_ISOLCPUS) + isolcpus_cs = cs; + else if (cs == isolcpus_cs) + isolcpus_cs = NULL; + /* * Update child cpusets, if present. * Force update if switching back to member. @@ -2486,7 +2570,12 @@ static struct cpuset *cpuset_attach_old_cs; */ static int cpuset_can_attach_check(struct cpuset *cs) { + /* + * Task cannot be moved to a cpuset with empty effective cpus or + * is an isolcpus partition. + */ if (cpumask_empty(cs->effective_cpus) || + (cs->partition_root_state == PRS_ISOLCPUS) || (!is_in_v2_mode() && nodes_empty(cs->mems_allowed))) return -ENOSPC; return 0; @@ -2902,24 +2991,30 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft) static int sched_partition_show(struct seq_file *seq, void *v) { struct cpuset *cs = css_cs(seq_css(seq)); + int prs = cs->partition_root_state; const char *err, *type = NULL; - switch (cs->partition_root_state) { + switch (prs) { case PRS_ROOT: seq_puts(seq, "root\n"); break; case PRS_ISOLATED: seq_puts(seq, "isolated\n"); break; + case PRS_ISOLCPUS: + seq_puts(seq, "isolcpus\n"); + break; case PRS_MEMBER: seq_puts(seq, "member\n"); break; - case PRS_INVALID_ROOT: - type = "root"; - fallthrough; - case PRS_INVALID_ISOLATED: - if (!type) + default: + if (prs == PRS_INVALID_ROOT) + type = "root"; + else if (prs == PRS_INVALID_ISOLATED) type = "isolated"; + else + type = "isolcpus"; + err = perr_strings[READ_ONCE(cs->prs_err)]; if (err) seq_printf(seq, "%s invalid (%s)\n", type, err); @@ -2948,6 +3043,8 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf, val = PRS_MEMBER; else if (!strcmp(buf, "isolated")) val = PRS_ISOLATED; + else if (!strcmp(buf, "isolcpus")) + val = PRS_ISOLCPUS; else return -EINVAL; @@ -3157,6 +3254,7 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css) nodes_clear(cs->effective_mems); fmeter_init(&cs->fmeter); cs->relax_domain_level = -1; + INIT_LIST_HEAD(&cs->isol_sibling); /* Set CS_MEMORY_MIGRATE for default hierarchy */ if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) @@ -3171,6 +3269,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css) struct cpuset *parent = parent_cs(cs); struct cpuset *tmp_cs; struct cgroup_subsys_state *pos_css; + int err = 0; if (!parent) return 0; @@ -3178,6 +3277,14 @@ static int cpuset_css_online(struct cgroup_subsys_state *css) cpus_read_lock(); percpu_down_write(&cpuset_rwsem); + /* + * An isolcpus partition cannot have direct children. + */ + if (parent->partition_root_state == PRS_ISOLCPUS) { + err = -EINVAL; + goto out_unlock; + } + set_bit(CS_ONLINE, &cs->flags); if (is_spread_page(parent)) set_bit(CS_SPREAD_PAGE, &cs->flags); @@ -3229,7 +3336,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css) out_unlock: percpu_up_write(&cpuset_rwsem); cpus_read_unlock(); - return 0; + return err; } /* @@ -3434,6 +3541,7 @@ int __init cpuset_init(void) fmeter_init(&top_cpuset.fmeter); set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags); top_cpuset.relax_domain_level = -1; + INIT_LIST_HEAD(&isol_children); BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL)); From patchwork Wed Apr 12 15:37:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 13209311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60415C77B6E for ; Wed, 12 Apr 2023 15:40:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231743AbjDLPju (ORCPT ); Wed, 12 Apr 2023 11:39:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231564AbjDLPjf (ORCPT ); Wed, 12 Apr 2023 11:39:35 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31E827DA2 for ; Wed, 12 Apr 2023 08:38:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681313915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=3Ia8eT4hnnOIWaojmGp8NnLGFrNjWnkgyMnD5LjPnXA=; b=GNdf1cidhz6zqGMXKwtt0c718G7Pbbe88hZNybdNCgCSuxZ5C5UUF6PHHfcnlE0jV1wliA icHYCDz7djoiz5V8oxoMfGXWV6QseYqhjXPQDQDyMyHNQCo3Cio7FcSpbs3EMqF5zZ0uSJ dhkr+DxWrSyKGjqOaV7uyg115UKTyxA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-526-HvoODgaEMMq6YAE_ZANmfw-1; Wed, 12 Apr 2023 11:38:30 -0400 X-MC-Unique: HvoODgaEMMq6YAE_ZANmfw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A2FFF100DEC2; Wed, 12 Apr 2023 15:38:29 +0000 (UTC) Received: from llong.com (unknown [10.22.32.168]) by smtp.corp.redhat.com (Postfix) with ESMTP id 20D9240C6E70; Wed, 12 Apr 2023 15:38:29 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Waiman Long Subject: [RFC PATCH 3/5] cgroup/cpuset: Make isolated partition pull CPUs from isolcpus partition Date: Wed, 12 Apr 2023 11:37:56 -0400 Message-Id: <20230412153758.3088111-4-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org With the addition of a new "isolcpus" partition in a previous patch, this patch adds the capability for a privileged user to pull isolated CPUs from the "isolcpus" partition to an "isolated" partition if its parent cannot satisfy its request directly. The following conditions must be true for the pulling of isolated CPUs from "isolcpus" partition to be successful. (1) The value of "cpuset.cpus" must still be a subset of its parent's "cpuset.cpus" to ensure proper inheritance even though these CPUs cannot be used until the cpuset becomes an "isolated" partition. (2) All the CPUs in "cpuset.cpus" are freely available in the "isolcpus" partition, i.e. in its "cpuset.cpus.effective" and not yet claimed by other isolated partitions. With this change, the CPUs in an "isolated" partition can either come from the "isolcpus" partition or from its direct parent, but not both. Now the parent of an isolated partition does not need to be a partition root anymore. Because of the cpu exclusive nature of an "isolated" partition, these isolated CPUs cannot be distributed to other siblings of that isolated partition. Changes to "cpuset.cpus" of such an isolated partition is allowed as long as all the newly requested CPUs can be granted from the "isolcpus" partition. Otherwise, the partition will become invalid. This makes the management and distribution of isolated CPUs to those applications that require them much easier. An "isolated" partition that pulls CPUs from the special "isolcpus" partition can now have 2 parents - the "isolcpus" partition where it gets its isolated CPUs and its hierarchical parent where it gets all the other resources. However, such an "isolated" partition cannot have subpartitions as all the CPUs from "isolcpus" must be in the same isolated state. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 282 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 264 insertions(+), 18 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 444eae3a9a6b..a5bbd43ed46e 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -101,6 +101,7 @@ enum prs_errcode { PERR_ISOLCPUS, PERR_ISOLTASK, PERR_ISOLCHILD, + PERR_ISOPARENT, }; static const char * const perr_strings[] = { @@ -114,6 +115,7 @@ static const char * const perr_strings[] = { [PERR_ISOLCPUS] = "An isolcpus partition is already present", [PERR_ISOLTASK] = "Isolcpus partition can't have tasks", [PERR_ISOLCHILD] = "Isolcpus partition can't have children", + [PERR_ISOPARENT] = "Isolated/isolcpus parent can't have subpartition", }; struct cpuset { @@ -1333,6 +1335,195 @@ static void update_partition_sd_lb(struct cpuset *cs, int old_prs) rebuild_sched_domains_locked(); } +/* + * isolcpus_pull - Enable or disable pulling of isolated cpus from isolcpus + * @cs: the cpuset to update + * @cmd: the command code (only partcmd_enable or partcmd_disable) + * Return: 1 if successful, 0 if error + * + * Note that pulling isolated cpus from isolcpus or cpus from parent does + * not require rebuilding sched domains. So we can change the flags directly. + */ +static int isolcpus_pull(struct cpuset *cs, enum subparts_cmd cmd) +{ + struct cpuset *parent = parent_cs(cs); + + if (!isolcpus_cs) + return 0; + + /* + * To enable pulling of isolated CPUs from isolcpus, cpus_allowed + * must be a subset of both its parent's cpus_allowed and isolcpus_cs's + * effective_cpus and the user has sysadmin privilege. + */ + if ((cmd == partcmd_enable) && capable(CAP_SYS_ADMIN) && + cpumask_subset(cs->cpus_allowed, isolcpus_cs->effective_cpus) && + cpumask_subset(cs->cpus_allowed, parent->cpus_allowed)) { + /* + * Move cpus from effective_cpus to subparts_cpus & make + * cs a child of isolcpus partition. + */ + spin_lock_irq(&callback_lock); + cpumask_andnot(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, cs->cpus_allowed); + cpumask_or(isolcpus_cs->subparts_cpus, + isolcpus_cs->subparts_cpus, cs->cpus_allowed); + cpumask_copy(cs->effective_cpus, cs->cpus_allowed); + isolcpus_cs->nr_subparts_cpus + = cpumask_weight(isolcpus_cs->subparts_cpus); + + if (cs->use_parent_ecpus) { + cs->use_parent_ecpus = false; + parent->child_ecpus_count--; + } + list_add(&cs->isol_sibling, &isol_children); + clear_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); + spin_unlock_irq(&callback_lock); + return 1; + } + + if ((cmd == partcmd_disable) && !list_empty(&cs->isol_sibling)) { + /* + * This can be called after isolcpus shrinks its cpu list. + * So not all the cpus should be returned back to isolcpus. + */ + WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED); + spin_lock_irq(&callback_lock); + cpumask_andnot(isolcpus_cs->subparts_cpus, + isolcpus_cs->subparts_cpus, cs->cpus_allowed); + cpumask_or(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, cs->effective_cpus); + cpumask_and(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, + isolcpus_cs->cpus_allowed); + cpumask_and(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, cpu_active_mask); + isolcpus_cs->nr_subparts_cpus + = cpumask_weight(isolcpus_cs->subparts_cpus); + + if (!cpumask_and(cs->effective_cpus, parent->effective_cpus, + cs->cpus_allowed)) { + cs->use_parent_ecpus = true; + parent->child_ecpus_count++; + cpumask_copy(cs->effective_cpus, + parent->effective_cpus); + } + list_del_init(&cs->isol_sibling); + cs->partition_root_state = PRS_INVALID_ISOLATED; + cs->prs_err = PERR_INVCPUS; + + set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); + clear_bit(CS_CPU_EXCLUSIVE, &cs->flags); + spin_unlock_irq(&callback_lock); + return 1; + } + return 0; +} + +static void isolcpus_disable(void) +{ + struct cpuset *child, *next; + + list_for_each_entry_safe(child, next, &isol_children, isol_sibling) + WARN_ON_ONCE(isolcpus_pull(child, partcmd_disable)); + + isolcpus_cs = NULL; +} + +/* + * isolcpus_cpus_update - cpuset.cpus change in isolcpus partition + */ +static void isolcpus_cpus_update(struct cpuset *cs) +{ + struct cpuset *child, *next; + + if (WARN_ON_ONCE(isolcpus_cs != cs)) + return; + + if (list_empty(&isol_children)) + return; + + /* + * Remove child isolated partitions that are not fully covered by + * subparts_cpus. + */ + list_for_each_entry_safe(child, next, &isol_children, + isol_sibling) { + if (cpumask_subset(child->cpus_allowed, + cs->subparts_cpus)) + continue; + + isolcpus_pull(child, partcmd_disable); + } +} + +/* + * isolated_cpus_update - cpuset.cpus change in isolated partition + * + * Return: 1 if no further action needs, 0 otherwise + */ +static int isolated_cpus_update(struct cpuset *cs, struct cpumask *newmask, + struct tmpmasks *tmp) +{ + struct cpumask *addmask = tmp->addmask; + struct cpumask *delmask = tmp->delmask; + + if (WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED) || + list_empty(&cs->isol_sibling)) + return 0; + + if (WARN_ON_ONCE(!isolcpus_cs) || cpumask_empty(newmask)) { + isolcpus_pull(cs, partcmd_disable); + return 0; + } + + if (cpumask_andnot(addmask, newmask, cs->cpus_allowed)) { + /* + * Check if isolcpus partition can provide the new CPUs + */ + if (!cpumask_subset(addmask, isolcpus_cs->cpus_allowed) || + cpumask_intersects(addmask, isolcpus_cs->subparts_cpus)) { + isolcpus_pull(cs, partcmd_disable); + return 0; + } + + /* + * Pull addmask isolated CPUs from isolcpus partition + */ + spin_lock_irq(&callback_lock); + cpumask_andnot(isolcpus_cs->subparts_cpus, + isolcpus_cs->subparts_cpus, addmask); + cpumask_andnot(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, addmask); + isolcpus_cs->nr_subparts_cpus + = cpumask_weight(isolcpus_cs->subparts_cpus); + spin_unlock_irq(&callback_lock); + } + + if (cpumask_andnot(tmp->delmask, cs->cpus_allowed, newmask)) { + /* + * Return isolated CPUs back to isolcpus partition + */ + spin_lock_irq(&callback_lock); + cpumask_or(isolcpus_cs->subparts_cpus, + isolcpus_cs->subparts_cpus, delmask); + cpumask_or(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, delmask); + cpumask_and(isolcpus_cs->effective_cpus, + isolcpus_cs->effective_cpus, cpu_active_mask); + isolcpus_cs->nr_subparts_cpus + = cpumask_weight(isolcpus_cs->subparts_cpus); + spin_unlock_irq(&callback_lock); + } + + spin_lock_irq(&callback_lock); + cpumask_copy(cs->cpus_allowed, newmask); + cpumask_andnot(cs->effective_cpus, newmask, cs->subparts_cpus); + cpumask_and(cs->effective_cpus, cs->effective_cpus, cpu_active_mask); + spin_unlock_irq(&callback_lock); + return 1; +} + /** * update_parent_subparts_cpumask - update subparts_cpus mask of parent cpuset * @cs: The cpuset that requests change in partition root state @@ -1579,7 +1770,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, spin_unlock_irq(&callback_lock); if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS)) - isolcpus_cs = NULL; + isolcpus_disable(); if (adding || deleting) update_tasks_cpumask(parent, tmp->addmask); @@ -1625,6 +1816,12 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, struct cpuset *parent = parent_cs(cp); bool update_parent = false; + /* + * Skip isolated cpuset that pull isolated CPUs from isolcpus + */ + if (!list_empty(&cp->isol_sibling)) + continue; + compute_effective_cpumask(tmp->new_cpus, cp, parent); /* @@ -1742,7 +1939,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, WARN_ON(!is_in_v2_mode() && !cpumask_equal(cp->cpus_allowed, cp->effective_cpus)); - update_tasks_cpumask(cp, tmp->new_cpus); + update_tasks_cpumask(cp, cp->effective_cpus); /* * On legacy hierarchy, if the effective cpumask of any non- @@ -1888,6 +2085,10 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, return retval; if (cs->partition_root_state) { + if (!list_empty(&cs->isol_sibling) && + isolated_cpus_update(cs, trialcs->cpus_allowed, &tmp)) + goto update_hier; /* CPUs update done */ + if (invalidate) update_parent_subparts_cpumask(cs, partcmd_invalidate, NULL, &tmp); @@ -1920,6 +2121,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, } spin_unlock_irq(&callback_lock); +update_hier: #ifdef CONFIG_CPUMASK_OFFSTACK /* Now trialcs->cpus_allowed is available */ tmp.new_cpus = trialcs->cpus_allowed; @@ -1928,8 +2130,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, /* effective_cpus will be updated here */ update_cpumasks_hier(cs, &tmp, false); - if (cs->partition_root_state) { - bool force = (cs->partition_root_state == PRS_ISOLCPUS); + if (cs->partition_root_state && list_empty(&cs->isol_sibling)) { struct cpuset *parent = parent_cs(cs); /* @@ -1937,8 +2138,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, * cpusets if they use parent's effective_cpus or when * the current cpuset is an isolcpus partition. */ - if (parent->child_ecpus_count || force) - update_sibling_cpumasks(parent, cs, &tmp, force); + if (cs->partition_root_state == PRS_ISOLCPUS) { + update_sibling_cpumasks(parent, cs, &tmp, true); + isolcpus_cpus_update(cs); + } else if (parent->child_ecpus_count) { + update_sibling_cpumasks(parent, cs, &tmp, false); + } /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */ update_partition_sd_lb(cs, old_prs); @@ -2307,7 +2512,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, return err; } -/** +/* * update_prstate - update partition_root_state * @cs: the cpuset to update * @new_prs: new partition root state @@ -2325,13 +2530,10 @@ static int update_prstate(struct cpuset *cs, int new_prs) return 0; /* - * For a previously invalid partition root, leave it at being - * invalid if new_prs is not "member". + * For a previously invalid partition root, treat it like a "member". */ - if (new_prs && is_prs_invalid(old_prs)) { - cs->partition_root_state = -new_prs; - return 0; - } + if (new_prs && is_prs_invalid(old_prs)) + old_prs = PRS_MEMBER; if (alloc_cpumasks(NULL, &tmpmask)) return -ENOMEM; @@ -2371,6 +2573,21 @@ static int update_prstate(struct cpuset *cs, int new_prs) } } + /* + * A parent isolated partition that gets its isolated CPUs from + * isolcpus cannot have subpartition. + */ + if (new_prs && !list_empty(&parent->isol_sibling)) { + err = PERR_ISOPARENT; + goto out; + } + + if ((old_prs == PRS_ISOLATED) && !list_empty(&cs->isol_sibling)) { + isolcpus_pull(cs, partcmd_disable); + old_prs = 0; + } + WARN_ON_ONCE(!list_empty(&cs->isol_sibling)); + err = update_partition_exclusive(cs, new_prs); if (err) goto out; @@ -2386,6 +2603,10 @@ static int update_prstate(struct cpuset *cs, int new_prs) err = update_parent_subparts_cpumask(cs, partcmd_enable, NULL, &tmpmask); + if (err && (new_prs == PRS_ISOLATED) && + isolcpus_pull(cs, partcmd_enable)) + err = 0; /* Successful isolcpus pull */ + if (err) goto out; } else if (old_prs && new_prs) { @@ -2445,7 +2666,7 @@ static int update_prstate(struct cpuset *cs, int new_prs) if (new_prs == PRS_ISOLCPUS) isolcpus_cs = cs; else if (cs == isolcpus_cs) - isolcpus_cs = NULL; + isolcpus_disable(); /* * Update child cpusets, if present. @@ -3674,8 +3895,31 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) } parent = parent_cs(cs); - compute_effective_cpumask(&new_cpus, cs, parent); nodes_and(new_mems, cs->mems_allowed, parent->effective_mems); + /* + * In the special case of a valid isolated cpuset pulling isolated + * cpus from isolcpus. We just need to mask offline cpus from + * cpus_allowed unless all the isolated cpus are gone. + */ + if (!list_empty(&cs->isol_sibling)) { + if (!cpumask_and(&new_cpus, cs->cpus_allowed, cpu_active_mask)) + isolcpus_pull(cs, partcmd_disable); + } else if ((cs->partition_root_state == PRS_ISOLCPUS) && + cpumask_empty(cs->cpus_allowed)) { + /* + * For isolcpus with empty cpus_allowed, just update + * effective_mems and be done with it. + */ + spin_lock_irq(&callback_lock); + if (nodes_empty(new_mems)) + cs->effective_mems = parent->effective_mems; + else + cs->effective_mems = new_mems; + spin_unlock_irq(&callback_lock); + goto unlock; + } else { + compute_effective_cpumask(&new_cpus, cs, parent); + } if (cs->nr_subparts_cpus) /* @@ -3707,10 +3951,12 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) * the following conditions hold: * 1) empty effective cpus but not valid empty partition. * 2) parent is invalid or doesn't grant any cpus to child - * partitions. + * partitions and not an isolated cpuset pulling cpus from + * isolcpus. */ - if (is_partition_valid(cs) && (!parent->nr_subparts_cpus || - (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) { + if (is_partition_valid(cs) && + ((!parent->nr_subparts_cpus && list_empty(&cs->isol_sibling)) || + (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) { int old_prs, parent_prs; update_parent_subparts_cpumask(cs, partcmd_disable, NULL, tmp); From patchwork Wed Apr 12 15:37:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 13209309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4996C77B6E for ; Wed, 12 Apr 2023 15:39:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231705AbjDLPjj (ORCPT ); Wed, 12 Apr 2023 11:39:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231610AbjDLPjc (ORCPT ); Wed, 12 Apr 2023 11:39:32 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A84E7B2 for ; Wed, 12 Apr 2023 08:38:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681313915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=k6rrgRvWYYfVtAToY1FAF0Jliiym6oOFi31ZayQ1IeY=; b=imvtpzDz/sxaqpj0xYKc8G/gpPBzp3+Qey96TaIUNUHIGJvkDXFyZz99EQpcQIdXl+UKlM CgG9xDzn61luKE7pfGjL9u9tCn3pYdhDR2C9r/VzkrUEaoUGxjgnzLCf4MlcL2HgA5u00H r1GE9SG7oXVvOvRtlbLiQnM5/cfr588= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-584-FOtmkdM9OOCYuEpr2kcA0Q-1; Wed, 12 Apr 2023 11:38:30 -0400 X-MC-Unique: FOtmkdM9OOCYuEpr2kcA0Q-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 348E0100DED1; Wed, 12 Apr 2023 15:38:30 +0000 (UTC) Received: from llong.com (unknown [10.22.32.168]) by smtp.corp.redhat.com (Postfix) with ESMTP id AF5A740C6E20; Wed, 12 Apr 2023 15:38:29 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Waiman Long Subject: [RFC PATCH 4/5] cgroup/cpuset: Documentation update for the new "isolcpus" partition Date: Wed, 12 Apr 2023 11:37:57 -0400 Message-Id: <20230412153758.3088111-5-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This patch updates the cgroup-v2.rst file to include information about the new "isolcpus" partition type. Signed-off-by: Waiman Long --- Documentation/admin-guide/cgroup-v2.rst | 89 +++++++++++++++++++------ 1 file changed, 70 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index f67c0829350b..352a02849fa7 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2225,7 +2225,8 @@ Cpuset Interface Files ========== ===================================== "member" Non-root member of a partition "root" Partition root - "isolated" Partition root without load balancing + "isolcpus" Partition root for isolated CPUs pool + "isolated" Partition root for isolated CPUs ========== ===================================== The root cgroup is always a partition root and its state @@ -2237,24 +2238,41 @@ Cpuset Interface Files its descendants except those that are separate partition roots themselves and their descendants. + When set to "isolcpus", the CPUs in that partition root will + be in an isolated state without any load balancing from the + scheduler. This partition root is special as there can be at + most one instance of it in a system and no task or child cpuset + is allowed in this cgroup. It acts as a pool of isolated CPUs to + be pulled into other "isolated" partitions. The "cpuset.cpus" + of an "isolcpus" partition root contains the list of isolated + CPUs it holds, where "cpuset.cpus.effective" contains the list + of freely available isolated CPUs that are ready to be pull + into other "isolated" partition. + When set to "isolated", the CPUs in that partition root will be in an isolated state without any load balancing from the scheduler. Tasks placed in such a partition with multiple CPUs should be carefully distributed and bound to each of the - individual CPUs for optimal performance. - - The value shown in "cpuset.cpus.effective" of a partition root - is the CPUs that the partition root can dedicate to a potential - new child partition root. The new child subtracts available - CPUs from its parent "cpuset.cpus.effective". - - A partition root ("root" or "isolated") can be in one of the - two possible states - valid or invalid. An invalid partition - root is in a degraded state where some state information may - be retained, but behaves more like a "member". - - All possible state transitions among "member", "root" and - "isolated" are allowed. + individual CPUs for optimal performance. The isolated CPUs can + come from either the parent partition root or from an "isolcpus" + partition if the parent cannot satisfy its request. + + The value shown in "cpuset.cpus.effective" of a partition root is + the CPUs that the partition root can dedicate to a potential new + child partition root. The new child partition subtracts available + CPUs from its parent "cpuset.cpus.effective". An exception is + an "isolated" partition that pulls its isolated CPUs from the + "isolcpus" partition root that is not its direct parent. + + A partition root can be in one of the two possible states - + valid or invalid. An invalid partition root is in a degraded + state where some state information may be retained, but behaves + more like a "member". + + All possible state transitions among "member", "root", "isolcpus" + and "isolated" are allowed. However, the partition root may + not be valid if the corresponding prerequisite conditions are + not met. On read, the "cpuset.cpus.partition" file can show the following values. @@ -2262,16 +2280,18 @@ Cpuset Interface Files ============================= ===================================== "member" Non-root member of a partition "root" Partition root - "isolated" Partition root without load balancing + "isolcpus" Partition root for isolated CPUs pool + "isolated" Partition root for isolated CPUs "root invalid ()" Invalid partition root + "isolcpus invalid ()" Invalid isolcpus partition root "isolated invalid ()" Invalid isolated partition root ============================= ===================================== In the case of an invalid partition root, a descriptive string on - why the partition is invalid is included within parentheses. + why the partition is invalid may be included within parentheses. - For a partition root to become valid, the following conditions - must be met. + For a "root" partition root to become valid, the following + conditions must be met. 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they are not shared by any of its siblings (exclusivity rule). @@ -2281,6 +2301,37 @@ Cpuset Interface Files 4) The "cpuset.cpus.effective" cannot be empty unless there is no task associated with this partition. + A valid "isolcpus" partition root requires the following + conditions. + + 1) The parent cgroup is a valid partition root. + 2) The "cpuset.cpus" must be a subset of parent's "cpuset.cpus" + including an empty cpu list. + 3) There can be no more than one valid "isolcpus" partition. + 4) No task or child cpuset is allowed. + + Note that an "isolcpus" partition is not exclusive and its + isolated CPUs can be distributed down sibling cgroups even + though they may not appear in their "cpuset.cpus.effective". + + A valid "isolated" partition root can pull isolated CPUs from + either its parent partition or from the "isolcpus" partition. + It also requires the following conditions to be met. + + 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they + are not shared by any of its siblings (exclusivity rule). + 2) The "cpuset.cpus" is not empty and must be a subset of + parent's "cpuset.cpus". + 3) The "cpuset.cpus.effective" cannot be empty unless there is + no task associated with this partition. + + If pulling isolated CPUS from "isolcpus" partition, + the "cpuset.cpus" must also be a subset of "isolcpus" + partition's "cpuset.cpus" and all the requested CPUs must + be available for pulling, i.e. in "isolcpus" partition's + "cpuset.cpus.effective". In this case, its hierarchical parent + does not need to be a valid partition root. + External events like hotplug or changes to "cpuset.cpus" can cause a valid partition root to become invalid and vice versa. Note that a task cannot be moved to a cgroup with empty From patchwork Wed Apr 12 15:37:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 13209310 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BF1FC77B72 for ; Wed, 12 Apr 2023 15:39:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231718AbjDLPjk (ORCPT ); Wed, 12 Apr 2023 11:39:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231633AbjDLPjc (ORCPT ); Wed, 12 Apr 2023 11:39:32 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D4907AA3 for ; Wed, 12 Apr 2023 08:38:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681313914; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=TYyLfq2As+fc0Uoge64kIVDicWOz8EL9ZigxxkRReJc=; b=emenyxu+jCvPS1nMnnuraGzqd7BUzftd4n8/Ak/lgrraGQKcRBQo0IeF/Y/JbgvbQEO3qD 8hBOCGAgJcdon1SNAyWrrBqsyRY1dMYq1vfH7NMZ75beHOS+n8nyu9cJ+UJniIcZJR65zW ZGo0TaPKlNBkgEf85h2zdcGC/ctCCCM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-269-EQUm72hbPgudMJkb7k-rIw-1; Wed, 12 Apr 2023 11:38:31 -0400 X-MC-Unique: EQUm72hbPgudMJkb7k-rIw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C3A701C09514; Wed, 12 Apr 2023 15:38:30 +0000 (UTC) Received: from llong.com (unknown [10.22.32.168]) by smtp.corp.redhat.com (Postfix) with ESMTP id 410F540C6E70; Wed, 12 Apr 2023 15:38:30 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Waiman Long Subject: [RFC PATCH 5/5] cgroup/cpuset: Extend test_cpuset_prs.sh to test isolcpus partition Date: Wed, 12 Apr 2023 11:37:58 -0400 Message-Id: <20230412153758.3088111-6-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This patch extends the test_cpuset_prs.sh test script to support testing the new isolcpus partition by adding new tests for specifically for isolcpus partition. In addition, the following changes are also made: 1) Remove the first column of the TEST_MATRIX as it is always the same and so is redundant. 2) Add a new C1 cgroup directory for testing and add that column to the TEST_MATRIX. 3) Add support for the .__DEBUG__.cpuset.cpus.subpartitions file if "cgroup_debug" kernel boot option is specified and a new column into TEST_MATRIX for testing against this cgroup control file. 4) Add another column to for the list of expected isolated CPUs and compare it with the actual value by looking at the state of /sys/kernel/debug/sched/domains. Signed-off-by: Waiman Long --- .../selftests/cgroup/test_cpuset_prs.sh | 376 ++++++++++++------ 1 file changed, 258 insertions(+), 118 deletions(-) diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index 2b5215cc599f..7fa2bfe6c1c0 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -23,18 +23,18 @@ WAIT_INOTIFY=$(cd $(dirname $0); pwd)/wait_inotify CGROUP2=$(mount -t cgroup2 | head -1 | awk -e '{print $3}') [[ -n "$CGROUP2" ]] || skip_test "Cgroup v2 mount point not found!" -CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//") -[[ $CPUS -lt 8 ]] && skip_test "Test needs at least 8 cpus available!" +NR_CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//") +[[ $NR_CPUS -lt 8 ]] && skip_test "Test needs at least 8 cpus available!" # Set verbose flag and delay factor PROG=$1 -VERBOSE= +VERBOSE=0 DELAY_FACTOR=1 SCHED_DEBUG= while [[ "$1" = -* ]] do case "$1" in - -v) VERBOSE=1 + -v) ((VERBOSE++)) # Enable sched/verbose can slow thing down [[ $DELAY_FACTOR -eq 1 ]] && DELAY_FACTOR=2 @@ -52,7 +52,7 @@ do done # Set sched verbose flag if available when "-v" option is specified -if [[ -n "$VERBOSE" && -d /sys/kernel/debug/sched ]] +if [[ $VERBOSE -gt 0 && -d /sys/kernel/debug/sched ]] then # Used to restore the original setting during cleanup SCHED_DEBUG=$(cat /sys/kernel/debug/sched/verbose) @@ -103,7 +103,7 @@ test_partition() [[ $? -eq 0 ]] || exit 1 ACTUAL_VAL=$(cat cpuset.cpus.partition) [[ $ACTUAL_VAL != $EXPECTED_VAL ]] && { - echo "cpuset.cpus.partition: expect $EXPECTED_VAL, found $EXPECTED_VAL" + echo "cpuset.cpus.partition: expect $EXPECTED_VAL, found $ACTUAL_VAL" echo "Test FAILED" exit 1 } @@ -114,7 +114,7 @@ test_effective_cpus() EXPECTED_VAL=$1 ACTUAL_VAL=$(cat cpuset.cpus.effective) [[ "$ACTUAL_VAL" != "$EXPECTED_VAL" ]] && { - echo "cpuset.cpus.effective: expect '$EXPECTED_VAL', found '$EXPECTED_VAL'" + echo "cpuset.cpus.effective: expect '$EXPECTED_VAL', found '$ACTUAL_VAL'" echo "Test FAILED" exit 1 } @@ -204,124 +204,175 @@ test_isolated() # Cgroup test hierarchy # # test -- A1 -- A2 -- A3 -# \- B1 +# +- B1 +# +- C1 # -# P = set cpus.partition (0:member, 1:root, 2:isolated, -1:root invalid) +# P = set cpus.partition (0:member, 1:root, 2:isolated, 3: isolcpus) # C = add cpu-list # S

= use prefix in subtree_control # T = put a task into cgroup -# O- = Write to CPU online file of +# O= = Write to CPU online file of # SETUP_A123_PARTITIONS="C1-3:P1:S+ C2-3:P1:S+ C3:P1" TEST_MATRIX=( - # test old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate - # ---- ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ - " S+ C0-1 . . C2-3 S+ C4-5 . . 0 A2:0-1" - " S+ C0-1 . . C2-3 P1 . . . 0 " - " S+ C0-1 . . C2-3 P1:S+ C0-1:P1 . . 0 " - " S+ C0-1 . . C2-3 P1:S+ C1:P1 . . 0 " - " S+ C0-1:S+ . . C2-3 . . . P1 0 " - " S+ C0-1:P1 . . C2-3 S+ C1 . . 0 " - " S+ C0-1:P1 . . C2-3 S+ C1:P1 . . 0 " - " S+ C0-1:P1 . . C2-3 S+ C1:P1 . P1 0 " - " S+ C0-1:P1 . . C2-3 C4-5 . . . 0 A1:4-5" - " S+ C0-1:P1 . . C2-3 S+:C4-5 . . . 0 A1:4-5" - " S+ C0-1 . . C2-3:P1 . . . C2 0 " - " S+ C0-1 . . C2-3:P1 . . . C4-5 0 B1:4-5" - " S+ C0-3:P1:S+ C2-3:P1 . . . . . . 0 A1:0-1,A2:2-3" - " S+ C0-3:P1:S+ C2-3:P1 . . C1-3 . . . 0 A1:1,A2:2-3" - " S+ C2-3:P1:S+ C3:P1 . . C3 . . . 0 A1:,A2:3 A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P1 . . C3 P0 . . 0 A1:3,A2:3 A1:P1,A2:P0" - " S+ C2-3:P1:S+ C2:P1 . . C2-4 . . . 0 A1:3-4,A2:2" - " S+ C2-3:P1:S+ C3:P1 . . C3 . . C0-2 0 A1:,B1:0-2 A1:P1,A2:P1" - " S+ $SETUP_A123_PARTITIONS . C2-3 . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 new-C1 fail ECPUs Pstate PCPUS ISOLCPUS + # ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ ----- -------- + " C0-1 . . C2-3 S+ C4-5 . . . 0 A2:0-1" + " C0-1 . . C2-3 P1 . . . . 0 " + " C0-1 . . C2-3 P1:S+ C0-1:P1 . . . 0 " + " C0-1 . . C2-3 P1:S+ C1:P1 . . . 0 " + " C0-1:S+ . . C2-3 . . . P1 . 0 " + " C0-1:P1 . . C2-3 S+ C1 . . . 0 " + " C0-1:P1 . . C2-3 S+ C1:P1 . . . 0 " + " C0-1:P1 . . C2-3 S+ C1:P1 . P1 . 0 " + " C0-1:P1 . . C2-3 C4-5 . . . . 0 A1:4-5" + " C0-1:P1 . . C2-3 S+:C4-5 . . . . 0 A1:4-5" + " C0-1 . . C2-3:P1 . . . C2 . 0 " + " C0-1 . . C2-3:P1 . . . C4-5 . 0 B1:4-5" + "C0-3:P1:S+ C2-3:P1 . . . . . . . 0 A1:0-1,A2:2-3" + "C0-3:P1:S+ C2-3:P1 . . C1-3 . . . . 0 A1:1,A2:2-3" + "C2-3:P1:S+ C3:P1 . . C3 . . . . 0 A1:,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . C3 P0 . . . 0 A1:3,A2:3 A1:P1,A2:P0" + "C2-3:P1:S+ C2:P1 . . C2-4 . . . . 0 A1:3-4,A2:2" + "C2-3:P1:S+ C3:P1 . . C3 . . C0-2 . 0 A1:,B1:0-2 A1:P1,A2:P1" + "$SETUP_A123_PARTITIONS . C2-3 . . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1" # CPU offlining cases: - " S+ C0-1 . . C2-3 S+ C4-5 . O2-0 0 A1:0-1,B1:3" - " S+ C0-3:P1:S+ C2-3:P1 . . O2-0 . . . 0 A1:0-1,A2:3" - " S+ C0-3:P1:S+ C2-3:P1 . . O2-0 O2-1 . . 0 A1:0-1,A2:2-3" - " S+ C0-3:P1:S+ C2-3:P1 . . O1-0 . . . 0 A1:0,A2:2-3" - " S+ C0-3:P1:S+ C2-3:P1 . . O1-0 O1-1 . . 0 A1:0-1,A2:2-3" - " S+ C2-3:P1:S+ C3:P1 . . O3-0 O3-1 . . 0 A1:2,A2:3 A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P2 . . O3-0 O3-1 . . 0 A1:2,A2:3 A1:P1,A2:P2" - " S+ C2-3:P1:S+ C3:P1 . . O2-0 O2-1 . . 0 A1:2,A2:3 A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P2 . . O2-0 O2-1 . . 0 A1:2,A2:3 A1:P1,A2:P2" - " S+ C2-3:P1:S+ C3:P1 . . O2-0 . . . 0 A1:,A2:3 A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P1 . . O3-0 . . . 0 A1:2,A2: A1:P1,A2:P1" - " S+ C2-3:P1:S+ C3:P1 . . T:O2-0 . . . 0 A1:3,A2:3 A1:P1,A2:P-1" - " S+ C2-3:P1:S+ C3:P1 . . . T:O3-0 . . 0 A1:2,A2:2 A1:P1,A2:P-1" - " S+ $SETUP_A123_PARTITIONS . O1-0 . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . O2-0 . . . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . O3-0 . . . 0 A1:1,A2:2,A3: A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . T:O1-0 . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" - " S+ $SETUP_A123_PARTITIONS . . T:O2-0 . . 0 A1:1,A2:3,A3:3 A1:P1,A2:P1,A3:P-1" - " S+ $SETUP_A123_PARTITIONS . . . T:O3-0 . 0 A1:1,A2:2,A3:2 A1:P1,A2:P1,A3:P-1" - " S+ $SETUP_A123_PARTITIONS . T:O1-0 O1-1 . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . . T:O2-0 O2-1 . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . . . T:O3-0 O3-1 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . T:O1-0 O2-0 O1-1 . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1" - " S+ $SETUP_A123_PARTITIONS . T:O1-0 O2-0 O2-1 . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" - - # test old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate - # ---- ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ + " C0-1 . . C2-3 S+ C4-5 . O2=0 . 0 A1:0-1,B1:3" + "C0-3:P1:S+ C2-3:P1 . . O2=0 . . . . 0 A1:0-1,A2:3" + "C0-3:P1:S+ C2-3:P1 . . O2=0 O2=1 . . . 0 A1:0-1,A2:2-3" + "C0-3:P1:S+ C2-3:P1 . . O1=0 . . . . 0 A1:0,A2:2-3" + "C0-3:P1:S+ C2-3:P1 . . O1=0 O1=1 . . . 0 A1:0-1,A2:2-3" + "C2-3:P1:S+ C3:P1 . . O3=0 O3=1 . . . 0 A1:2,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P2 . . O3=0 O3=1 . . . 0 A1:2,A2:3 A1:P1,A2:P2" + "C2-3:P1:S+ C3:P1 . . O2=0 O2=1 . . . 0 A1:2,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P2 . . O2=0 O2=1 . . . 0 A1:2,A2:3 A1:P1,A2:P2" + "C2-3:P1:S+ C3:P1 . . O2=0 . . . . 0 A1:,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . O3=0 . . . . 0 A1:2,A2: A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . T:O2=0 . . . . 0 A1:3,A2:3 A1:P1,A2:P-1" + "C2-3:P1:S+ C3:P1 . . . T:O3=0 . . . 0 A1:2,A2:2 A1:P1,A2:P-1" + "$SETUP_A123_PARTITIONS . O1=0 . . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . O2=0 . . . . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . O3=0 . . . . 0 A1:1,A2:2,A3: A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . T:O1=0 . . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" + "$SETUP_A123_PARTITIONS . . T:O2=0 . . . 0 A1:1,A2:3,A3:3 A1:P1,A2:P1,A3:P-1" + "$SETUP_A123_PARTITIONS . . . T:O3=0 . . 0 A1:1,A2:2,A3:2 A1:P1,A2:P1,A3:P-1" + "$SETUP_A123_PARTITIONS . T:O1=0 O1=1 . . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . . T:O2=0 O2=1 . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . . . T:O3=0 O3=1 . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . T:O1=0 O2=0 O1=1 . . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1" + "$SETUP_A123_PARTITIONS . T:O1=0 O2=0 O2=1 . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" + + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 new-C1 fail ECPUs Pstate PCPUS ISOLCPUS + # ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ ----- -------- + # + # isolcpus partition tests + # + + # isolcpus partition can have empty cpuset.cpus & effective cpus + " . . . P3 . . . . . 0 B1: B1:P3" + + # isolcpus partition is not exclusive + " C1-2 . . C3:P3 C1-3:S+ C3 . . . 0 A1:1-2,A2:1-2,B1:3 B1:P3" + " C1-3 . . C3 . . . P3 . 0 A1:1-2,B1:3 B1:P3" + + # Only 1 isolcpus partition is allowed + " . . . C3:P3 C1:P3 . . . . 0 A1:1,B1:3 A1:P-3,B1:P3" + + # Isolated partition can pull isolated cpus from isolcpus partition + " C1-3:S+ C3 . C3:P3 . P2 . . . 0 A1:1-2,A2:3,B1: A2:P2,B1:P3 .:3,B1:3 3" + " C1-3:S+ C3 . C3:P3 . P2 . C2-3 . 0 A1:1,A2:3,B1:2 A2:P2,B1:P3 .:2-3,B1:3 2-3" + + # Isolated partition becomes invalid if cpu update fails pulling + " C1-3:S+ C3 . C3:P3 . P2:C2-3 . . . 0 A1:1-2,A2:2,B1:3 A2:P-2,B1:P3 .:3,B1: 3" + " C1-3:S+ C3 . C3:P3 . P2 . C1 . 0 A1:2-3,A2:3,B1:1 A2:P-2,B1:P3 .:1,B1: 1" + + # Once isolated partition pulls cpus from isolcpus, parent can shrink cpu list + " C1-3:S+ C3:P2 . C3:P3 C1-2 . . . . 0 A1:1-2,A2:3,B1: A2:P2,B1:P3 . 3" + " C1-3:S+ C3:P2 . C3:P3 C1 . . . . 0 A1:1,A2:3,B1: A2:P2,B1:P3 . 3" + + # Isolated partition can't be enabled if it can't pull all isolated cpus from parent or isolcpus + " C1-3:S+ C2 . C3:P3 . P2 . . . 0 A1:1-2,A2:2,B1:3 A2:P-2,B1:P3" + + # Isolated/isolcpus partition online/offline tests + " C1-3:S+ C3 . C2-3:P3 . P2 O2=0 . . 0 A1:1,A2:3,B1: A2:P2,B1:P3 .:2-3,B1:3 2-3" + " C1-3:S+ C3 . C2-3:P3 . P2 O2=0 O2=1 . 0 A1:1,A2:3,B1:2 A2:P2,B1:P3 .:2-3,B1:3 2-3" + " C1-3:S+ C2-3 . C2-3:P3 . P2 O2=0 . . 0 A1:1,A2:3,B1: A2:P2,B1:P3 .:2-3,B1:2-3 2-3" + " C1-3:S+ C2-3 . C2-3:P3 . P2 O2=0 O2=1 . 0 A1:1,A2:2-3,B1: A2:P2,B1:P3 .:2-3,B1:2-3 2-3" + + # Isolated partition pulling from isolcpus become invalid if all isolated cpus gone + " C1-3:S+ C3 . C2-3:P3 . P2 O3=0 . . 0 A1:1,A2:1,B1:2 A2:P-2,B1:P3 .:2-3,B1:" + " C1-3:S+ C3 . C2-3:P3 . P2 O3=0 O3=1 . 0 A1:1,A2:1,B1:2-3 A2:P-2,B1:P3 .:2-3,B1:" + + # Hotplug won't affect isolcpus partition with empty cpus_allowed + " C1-3 . . P3 . . O1=0 . . 0 A1:2-3,B1: B1:P3" + + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 new-C1 fail ECPUs Pstate PCPUS ISOLCPUS + # ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ ----- -------- # # Incorrect change to cpuset.cpus invalidates partition root # # Adding CPUs to partition root that are not in parent's # cpuset.cpus is allowed, but those extra CPUs are ignored. - " S+ C2-3:P1:S+ C3:P1 . . . C2-4 . . 0 A1:,A2:2-3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . . C2-4 . . . 0 A1:,A2:2-3 A1:P1,A2:P1" # Taking away all CPUs from parent or itself if there are tasks # will make the partition invalid. - " S+ C2-3:P1:S+ C3:P1 . . T C2-3 . . 0 A1:2-3,A2:2-3 A1:P1,A2:P-1" - " S+ C3:P1:S+ C3 . . T P1 . . 0 A1:3,A2:3 A1:P1,A2:P-1" - " S+ $SETUP_A123_PARTITIONS . T:C2-3 . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" - " S+ $SETUP_A123_PARTITIONS . T:C2-3:C1-3 . . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" + "C2-3:P1:S+ C3:P1 . . T C2-3 . . . 0 A1:2-3,A2:2-3 A1:P1,A2:P-1" + " C3:P1:S+ C3 . . T P1 . . . 0 A1:3,A2:3 A1:P1,A2:P-1" + "$SETUP_A123_PARTITIONS . T:C2-3 . . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1" + "$SETUP_A123_PARTITIONS . T:C2-3:C1-3 . . . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1" # Changing a partition root to member makes child partitions invalid - " S+ C2-3:P1:S+ C3:P1 . . P0 . . . 0 A1:2-3,A2:3 A1:P0,A2:P-1" - " S+ $SETUP_A123_PARTITIONS . C2-3 P0 . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P0,A3:P-1" + "C2-3:P1:S+ C3:P1 . . P0 . . . . 0 A1:2-3,A2:3 A1:P0,A2:P-1" + "$SETUP_A123_PARTITIONS . C2-3 P0 . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P0,A3:P-1" # cpuset.cpus can contains cpus not in parent's cpuset.cpus as long # as they overlap. - " S+ C2-3:P1:S+ . . . . C3-4:P1 . . 0 A1:2,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ . . . . C3-4:P1 . . . 0 A1:2,A2:3 A1:P1,A2:P1" # Deletion of CPUs distributed to child cgroup is allowed. - " S+ C0-1:P1:S+ C1 . C2-3 C4-5 . . . 0 A1:4-5,A2:4-5" + "C0-1:P1:S+ C1 . C2-3 C4-5 . . . . 0 A1:4-5,A2:4-5" # To become a valid partition root, cpuset.cpus must overlap parent's # cpuset.cpus. - " S+ C0-1:P1 . . C2-3 S+ C4-5:P1 . . 0 A1:0-1,A2:0-1 A1:P1,A2:P-1" + " C0-1:P1 . . C2-3 S+ C4-5:P1 . . . 0 A1:0-1,A2:0-1 A1:P1,A2:P-1" # Enabling partition with child cpusets is allowed - " S+ C0-1:S+ C1 . C2-3 P1 . . . 0 A1:0-1,A2:1 A1:P1" + " C0-1:S+ C1 . C2-3 P1 . . . . 0 A1:0-1,A2:1 A1:P1" # A partition root with non-partition root parent is invalid, but it # can be made valid if its parent becomes a partition root too. - " S+ C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1,A2:1 A1:P0,A2:P-2" - " S+ C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0,A2:1 A1:P1,A2:P2" + " C0-1:S+ C1 . C2-3 . P2 . . . 0 A1:0-1,A2:1 A1:P0,A2:P-2" + " C0-1:S+ C1:P2 . C2-3 P1 . . . . 0 A1:0,A2:1 A1:P1,A2:P2" # A non-exclusive cpuset.cpus change will invalidate partition and its siblings - " S+ C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P0" - " S+ C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P-1" - " S+ C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P0,B1:P-1" + " C0-1:P1 . . C2-3 C0-2 . . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P0" + " C0-1:P1 . . P1:C2-3 C0-2 . . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P-1" + " C0-1 . . P1:C2-3 C0-2 . . . . 0 A1:0-2,B1:2-3 A1:P0,B1:P-1" - # test old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate - # ---- ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 new-C1 fail ECPUs Pstate PCPUS ISOLCPUS + # ------ ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ ----- -------- # Failure cases: # A task cannot be added to a partition with no cpu - " S+ C2-3:P1:S+ C3:P1 . . O2-0:T . . . 1 A1:,A2:3 A1:P1,A2:P1" + "C2-3:P1:S+ C3:P1 . . O2=0:T . . . . 1 A1:,A2:3 A1:P1,A2:P1" + + # Task is not allowed in an isolcpus partition + " . . . C3:P3 . . . T . 1" + + # Child cpuset is not allowed under an isolcpus partition + " C1:P3 . . . S+ . . . . 1" ) # # Write to the cpu online file -# $1 - - where = cpu number, value to be written +# $1 - = where = cpu number, value to be written # write_cpu_online() { - CPU=${1%-*} - VAL=${1#*-} + CPU=${1%=*} + VAL=${1#*=} CPUFILE=//sys/devices/system/cpu/cpu${CPU}/online if [[ $VAL -eq 0 ]] then @@ -349,11 +400,12 @@ set_ctrl_state() TMPMSG=/tmp/.msg_$$ CGRP=$1 STATE=$2 - SHOWERR=${3}${VERBOSE} + SHOWERR=${3} CTRL=${CTRL:=$CONTROLLER} HASERR=0 REDIRECT="2> $TMPMSG" [[ -z "$STATE" || "$STATE" = '.' ]] && return 0 + [[ $VERBOSE -gt 0 ]] && SHOWERR=1 rm -f $TMPMSG for CMD in $(echo $STATE | sed -e "s/:/ /g") @@ -383,6 +435,9 @@ set_ctrl_state() ;; 2) VAL=isolated ;; + 3) + VAL=isolcpus + ;; *) echo "Invalid partition state - $VAL" exit 1 @@ -430,7 +485,7 @@ online_cpus() [[ -n "OFFLINE_CPUS" ]] && { for C in $OFFLINE_CPUS do - write_cpu_online ${C}-1 + write_cpu_online ${C}=1 done } } @@ -442,19 +497,23 @@ reset_cgroup_states() { echo 0 > $CGROUP2/cgroup.procs online_cpus - rmdir A1/A2/A3 A1/A2 A1 B1 > /dev/null 2>&1 + rmdir A1/A2/A3 A1/A2 A1 B1 C1 > /dev/null 2>&1 set_ctrl_state . S- pause 0.01 } dump_states() { - for DIR in A1 A1/A2 A1/A2/A3 B1 + for DIR in . A1 A1/A2 A1/A2/A3 B1 C1 do ECPUS=$DIR/cpuset.cpus.effective PRS=$DIR/cpuset.cpus.partition + PCPUS=$DIR/cpuset.cpus.subpartitions + [[ -e $PCPUS ]] || + PCPUS=$DIR/.__DEBUG__.cpuset.cpus.subpartitions [[ -e $ECPUS ]] && echo "$ECPUS: $(cat $ECPUS)" [[ -e $PRS ]] && echo "$PRS: $(cat $PRS)" + [[ -e $PCPUS ]] && echo "$PCPUS: $(cat $PCPUS)" done } @@ -478,6 +537,26 @@ check_effective_cpus() done } +# +# Check subparts cpus +# $1 - check string, format: :[,:]* +# +check_subparts_cpus() +{ + CHK_STR=$1 + for CHK in $(echo $CHK_STR | sed -e "s/,/ /g") + do + set -- $(echo $CHK | sed -e "s/:/ /g") + CGRP=$1 + CPUS=$2 + [[ $CGRP = A2 ]] && CGRP=A1/A2 + [[ $CGRP = A3 ]] && CGRP=A1/A2/A3 + FILE=$CGRP/.__DEBUG__.cpuset.cpus.subpartitions + [[ -e $FILE ]] || return 0 # Skip test + [[ $CPUS = $(cat $FILE) ]] || return 1 + done +} + # # Check cgroup states # $1 - check string, format: :[,:]* @@ -512,18 +591,80 @@ check_cgroup_states() isolated) VAL=2 ;; + isolcpus) + VAL=3 + ;; "root invalid"*) VAL=-1 ;; "isolated invalid"*) VAL=-2 ;; + "isolcpus invalid"*) + VAL=-3 + ;; esac [[ $EVAL != $VAL ]] && return 1 done return 0 } +# +# Get isolated (including offline) CPUs by looking at +# /sys/kernel/debug/sched/domains and compare that with the expected value. +# +# $1 - expected isolated cpu list +# +check_isolcpus() +{ + EXPECT_VAL=$1 + ISOLCPUS= + LASTISOLCPU= + SCHED_DOMAINS=/sys/kernel/debug/sched/domains + [[ -d $SCHED_DOMAINS ]] || return 0 # Skip check + + for ((CPU=0; CPU < $NR_CPUS; CPU++)) + do + [[ -n "$(ls ${SCHED_DOMAINS}/cpu$CPU)" ]] && continue + + if [[ -z "$LASTISOLCPU" ]] + then + ISOLCPUS=$CPU + LASTISOLCPU=$CPU + elif [[ "$LASTISOLCPU" -eq $((CPU - 1)) ]] + then + echo $ISOLCPUS | grep -q "\<$LASTISOLCPU\$" + if [[ $? -eq 0 ]] + then + ISOLCPUS=${ISOLCPUS}- + fi + LASTISOLCPU=$CPU + else + if [[ $ISOLCPUS = *- ]] + then + ISOLCPUS=${ISOLCPUS}$LASTISOLCPU + fi + ISOLCPUS=${ISOLCPUS},$CPU + LASTISOLCPU=$CPU + fi + done + [[ "$ISOLCPUS" = *- ]] && ISOLCPUS=${ISOLCPUS}$LASTISOLCPU + [[ $EXPECT_VAL = $ISOLCPUS ]] +} + +test_fail() +{ + TESTNUM=$1 + TESTTYPE=$2 + ADDINFO=$3 + echo "Test $TEST[$TESTNUM] failed $TESTTYPE check!" + [[ -n "$ADDINFO" ]] && echo "*** $ADDINFO ***" + eval echo \"\${$TEST[$I]}\" + echo + dump_states + exit 1 +} + # # Run cpuset state transition test # $1 - test matrix name @@ -548,60 +689,59 @@ run_state_test() while [[ $I -lt $CNT ]] do echo "Running test $I ..." > /dev/console + [[ $VERBOSE -gt 1 ]] && eval echo \"\${$TEST[$I]}\" eval set -- "\${$TEST[$I]}" - ROOT=$1 - OLD_A1=$2 - OLD_A2=$3 - OLD_A3=$4 - OLD_B1=$5 - NEW_A1=$6 - NEW_A2=$7 - NEW_A3=$8 - NEW_B1=$9 + OLD_A1=$1 + OLD_A2=$2 + OLD_A3=$3 + OLD_B1=$4 + NEW_A1=$5 + NEW_A2=$6 + NEW_A3=$7 + NEW_B1=$8 + NEW_C1=$9 RESULT=${10} ECPUS=${11} STATES=${12} + PCPUS=${13} + ICPUS=${14} - set_ctrl_state_noerr . $ROOT + set_ctrl_state_noerr . "S+" + set_ctrl_state_noerr B1 $OLD_B1 set_ctrl_state_noerr A1 $OLD_A1 set_ctrl_state_noerr A1/A2 $OLD_A2 set_ctrl_state_noerr A1/A2/A3 $OLD_A3 - set_ctrl_state_noerr B1 $OLD_B1 RETVAL=0 set_ctrl_state A1 $NEW_A1; ((RETVAL += $?)) set_ctrl_state A1/A2 $NEW_A2; ((RETVAL += $?)) set_ctrl_state A1/A2/A3 $NEW_A3; ((RETVAL += $?)) set_ctrl_state B1 $NEW_B1; ((RETVAL += $?)) + set_ctrl_state C1 $NEW_C1; ((RETVAL += $?)) - [[ $RETVAL -ne $RESULT ]] && { - echo "Test $TEST[$I] failed result check!" - eval echo \"\${$TEST[$I]}\" - dump_states - exit 1 - } + [[ $RETVAL -ne $RESULT ]] && test_fail $I result [[ -n "$ECPUS" && "$ECPUS" != . ]] && { check_effective_cpus $ECPUS - [[ $? -ne 0 ]] && { - echo "Test $TEST[$I] failed effective CPU check!" - eval echo \"\${$TEST[$I]}\" - echo - dump_states - exit 1 - } + [[ $? -ne 0 ]] && test_fail $I "effective CPU" } - [[ -n "$STATES" ]] && { + [[ -n "$STATES" && "$STATES" != . ]] && { check_cgroup_states $STATES - [[ $? -ne 0 ]] && { - echo "FAILED: Test $TEST[$I] failed states check!" - eval echo \"\${$TEST[$I]}\" - echo - dump_states - exit 1 - } + [[ $? -ne 0 ]] && test_fail $I states } + [[ -n "$PCPUS" && "$PCPUS" != . ]] && { + check_subparts_cpus $PCPUS + [[ $? -ne 0 ]] && test_fail $I "subpartitions CPU" + } + + # Compare the expected isolated CPUs with the actual ones, + # if available + [[ -n "$ICPUS" ]] && { + check_isolcpus $ICPUS + [[ $? -ne 0 ]] && test_fail $I "isolated CPU" \ + "Expect $ICPUS, get $ISOLCPUS instead" + } reset_cgroup_states # # Check to see if effective cpu list changes @@ -612,7 +752,7 @@ run_state_test() echo "Effective cpus changed to $NEWLIST after test $I!" exit 1 } - [[ -n "$VERBOSE" ]] && echo "Test $I done." + [[ $VERBOSE -gt 0 ]] && echo "Test $I done." ((I++)) done echo "All $I tests of $TEST PASSED." @@ -655,7 +795,7 @@ test_inotify() rm -f $PRS wait_inotify $PWD/cpuset.cpus.partition $PRS & pause 0.01 - set_ctrl_state . "O1-0" + set_ctrl_state . "O1=0" pause 0.01 check_cgroup_states ".:P-1" if [[ $? -ne 0 ]]