diff mbox series

[v4,8/9] cgroup/cpuset: Documentation update for partition

Message ID 20230627143508.1576882-9-longman@redhat.com (mailing list archive)
State New
Headers show
Series cgroup/cpuset: Support remote partitions | expand

Commit Message

Waiman Long June 27, 2023, 2:35 p.m. UTC
This patch updates the cgroup-v2.rst file to include information about
the new "cpuset.cpus.exclusive" control file as well as the new remote
partition.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 100 ++++++++++++++++++------
 1 file changed, 74 insertions(+), 26 deletions(-)

Comments

Tejun Heo July 10, 2023, 9:30 p.m. UTC | #1
Hello,

On Tue, Jun 27, 2023 at 10:35:07AM -0400, Waiman Long wrote:
...
> +	There are two types of partitions - local and remote.  A local
> +	partition is one whose parent cgroup is also a valid partition
> +	root.  A remote partition is one whose parent cgroup is not a
> +	valid partition root itself.  Writing to "cpuset.cpus.exclusive"
> +	is not mandatory for the creation of a local partition as its
> +	"cpuset.cpus.exclusive" file will be filled in automatically if
> +	it is not set.	The automaticaly set value will be based on its
> +	"cpuset.cpus" value.  Writing the proper "cpuset.cpus.exclusive"
> +	values down the cgroup hierarchy is mandatory for the creation
> +	of a remote partition.

Wouldn't a partition root's cpus.exclusive always contain all of the CPUs in
its cpus? Would it make sense for cpus.exclusive to be different from .cpus?

Thanks.
Waiman Long July 11, 2023, 12:21 a.m. UTC | #2
On 7/10/23 17:30, Tejun Heo wrote:
> Hello,
>
> On Tue, Jun 27, 2023 at 10:35:07AM -0400, Waiman Long wrote:
> ...
>> +	There are two types of partitions - local and remote.  A local
>> +	partition is one whose parent cgroup is also a valid partition
>> +	root.  A remote partition is one whose parent cgroup is not a
>> +	valid partition root itself.  Writing to "cpuset.cpus.exclusive"
>> +	is not mandatory for the creation of a local partition as its
>> +	"cpuset.cpus.exclusive" file will be filled in automatically if
>> +	it is not set.	The automaticaly set value will be based on its
>> +	"cpuset.cpus" value.  Writing the proper "cpuset.cpus.exclusive"
>> +	values down the cgroup hierarchy is mandatory for the creation
>> +	of a remote partition.
> Wouldn't a partition root's cpus.exclusive always contain all of the CPUs in
> its cpus? Would it make sense for cpus.exclusive to be different from .cpus?
>
> Thanks.

In auto-filled case, it should be the same as cpuset.cpus. I will 
clarify that in the documentation. Thanks for catching that.

Cheers,
Longman
Tejun Heo July 11, 2023, 12:42 a.m. UTC | #3
Hello,

On Mon, Jul 10, 2023 at 08:21:43PM -0400, Waiman Long wrote:
> > Wouldn't a partition root's cpus.exclusive always contain all of the CPUs in
> > its cpus? Would it make sense for cpus.exclusive to be different from .cpus?
> 
> In auto-filled case, it should be the same as cpuset.cpus. I will clarify
> that in the documentation. Thanks for catching that.

When the user writes something to the file, what would it mena if the
content differs from the cgroup's cpuset.cpus?

Thanks.
Waiman Long July 11, 2023, 12:53 a.m. UTC | #4
On 7/10/23 20:42, Tejun Heo wrote:
> Hello,
>
> On Mon, Jul 10, 2023 at 08:21:43PM -0400, Waiman Long wrote:
>>> Wouldn't a partition root's cpus.exclusive always contain all of the CPUs in
>>> its cpus? Would it make sense for cpus.exclusive to be different from .cpus?
>> In auto-filled case, it should be the same as cpuset.cpus. I will clarify
>> that in the documentation. Thanks for catching that.
> When the user writes something to the file, what would it mena if the
> content differs from the cgroup's cpuset.cpus?

For local partition, it doesn't make sense to have a 
cpust.cpus.exclusive that is not the same as cpuset.cpus as it 
artificially reduce the set of CPUs that can be used in a partition. In 
the case of a remote partition, the ancestor cgroups of a remote 
partition should have cpuset.cpus.exclusive smaller than cpuset.cpus so 
that when the remote partition is enabled, there are still CPUs left to 
be used by those cgroups. In essence, the cpuset.cpus.exclusive 
represents the CPUs that may not be usable anymore if they are taken by 
a remote partition downstream.

Cheers,
Longman
Tejun Heo July 11, 2023, 1:07 a.m. UTC | #5
Hello,

On Mon, Jul 10, 2023 at 08:53:18PM -0400, Waiman Long wrote:
> For local partition, it doesn't make sense to have a cpust.cpus.exclusive
> that is not the same as cpuset.cpus as it artificially reduce the set of
> CPUs that can be used in a partition. In the case of a remote partition, the

Yeah, I was wondering about local partitions. "Automatic but can be
overridden" behavior becomes confusing if it's difficult for the user to
easily tell which part is automatic when. I wonder whether it'd be better to
make the condition static - e.g. for a partition cgroup, cpus.exclusive
always contains all bits in cpus no matter what value is written to it. Or,
if we separate out cpus.exclusive and cpus.exclusive.effective, no matter
what cpus.exclusive is set, a partition root's cpus.exclusive.effective
always includes all bits in cpus.effective.

Thanks.
Waiman Long July 11, 2023, 3:24 a.m. UTC | #6
On 7/10/23 21:07, Tejun Heo wrote:
> Hello,
>
> On Mon, Jul 10, 2023 at 08:53:18PM -0400, Waiman Long wrote:
>> For local partition, it doesn't make sense to have a cpust.cpus.exclusive
>> that is not the same as cpuset.cpus as it artificially reduce the set of
>> CPUs that can be used in a partition. In the case of a remote partition, the
> Yeah, I was wondering about local partitions. "Automatic but can be
> overridden" behavior becomes confusing if it's difficult for the user to
> easily tell which part is automatic when. I wonder whether it'd be better to
> make the condition static - e.g. for a partition cgroup, cpus.exclusive
> always contains all bits in cpus no matter what value is written to it. Or,
> if we separate out cpus.exclusive and cpus.exclusive.effective, no matter
> what cpus.exclusive is set, a partition root's cpus.exclusive.effective
> always includes all bits in cpus.effective.

With no offline CPUs, cpus.effective should be the same as 
cpus.exclusive.effective for a valid partition root. Here 
cpus.exclusive.effective is a bit different from cpus.effective as it 
can contain offline cpus. It also mean that adding 
cpus.exclusive.effective can be redundant.

As said before, I try to avoid adding new cpuset control file unless 
absolutely necessary. I now have a slight different proposal. Once 
manually set, I can keep cpuset.cpus.exclusive invariant. I do need to 
do a bit more work when enabling a partition root to find out the 
effective set of exclusive CPUs to be used or make the partition invalid 
if no exclusive CPU is available. I still want to do a initial check 
when setting cpuset.cpus.exclusive to make sure that the value is at 
least valid at the beginning.

Do you think this is an acceptable compromise?

Thanks,
Longman
diff mbox series

Patch

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index d9f3768a10db..8dd7464f93dc 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2215,6 +2215,27 @@  Cpuset Interface Files
 
 	Its value will be affected by memory nodes hotplug events.
 
+  cpuset.cpus.exclusive
+	A read-write multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists all the exclusive CPUs that can be used to create a
+	new cpuset partition.  Its value is not used unless the cgroup
+	becomes a valid partition root.  See the next section below
+	for a description of what a cpuset partition is.
+
+	The root cgroup is a partition root and all its available CPUs
+	are in its exclusive CPU set.
+
+	There are constraints on what values are acceptable
+	to this control file.  Its value must be a subset of
+	the cgroup's "cpuset.cpus" value and the parent cgroup's
+	"cpuset.cpus.exclusive" value.	For a parent cgroup, any one
+	its exclusive CPUs can only be distributed to at most one of
+	its child cgroups.  Having an exclusive CPU appearing in two or
+	more of its child cgroups is not allowed (the exclusivity rule).
+	An invalid value will be rejected with a write error.
+
   cpuset.cpus.partition
 	A read-write single value file which exists on non-root
 	cpuset-enabled cgroups.  This flag is owned by the parent cgroup
@@ -2228,26 +2249,41 @@  Cpuset Interface Files
 	  "isolated"	Partition root without load balancing
 	  ==========	=====================================
 
-	The root cgroup is always a partition root and its state
-	cannot be changed.  All other non-root cgroups start out as
-	"member".
+	A cpuset partition is a collection of cpuset-enabled cgroups with
+	a partition root at the top of the hierarchy and its descendants
+	except those that are separate partition roots themselves and
+	their descendants.  A partition has exclusive access to the
+	set of exclusive CPUs allocated to it.	Other cgroups outside
+	of that partition cannot use any CPUs in that set.
+
+	There are two types of partitions - local and remote.  A local
+	partition is one whose parent cgroup is also a valid partition
+	root.  A remote partition is one whose parent cgroup is not a
+	valid partition root itself.  Writing to "cpuset.cpus.exclusive"
+	is not mandatory for the creation of a local partition as its
+	"cpuset.cpus.exclusive" file will be filled in automatically if
+	it is not set.	The automaticaly set value will be based on its
+	"cpuset.cpus" value.  Writing the proper "cpuset.cpus.exclusive"
+	values down the cgroup hierarchy is mandatory for the creation
+	of a remote partition.
+
+	Currently, a remote partition cannot be created under a local
+	partition.  All the ancestors of a remote partition root except
+	the root cgroup cannot be partition root.
+
+	The root cgroup is always a partition root and its state cannot
+	be changed.  All other non-root cgroups start out as "member".
 
 	When set to "root", the current cgroup is the root of a new
-	partition or scheduling domain that comprises itself and all
-	its descendants except those that are separate partition roots
-	themselves and their descendants.
+	partition or scheduling domain.  The set of exclusive CPUs is
+	determined by the value of its "cpuset.cpus.exclusive".
 
-	When set to "isolated", the CPUs in that partition root will
+	When set to "isolated", the CPUs in that partition will
 	be in an isolated state without any load balancing from the
 	scheduler.  Tasks placed in such a partition with multiple
 	CPUs should be carefully distributed and bound to each of the
 	individual CPUs for optimal performance.
 
-	The value shown in "cpuset.cpus.effective" of a partition root
-	is the CPUs that the partition root can dedicate to a potential
-	new child partition root. The new child subtracts available
-	CPUs from its parent "cpuset.cpus.effective".
-
 	A partition root ("root" or "isolated") can be in one of the
 	two possible states - valid or invalid.  An invalid partition
 	root is in a degraded state where some state information may
@@ -2270,33 +2306,40 @@  Cpuset Interface Files
 	In the case of an invalid partition root, a descriptive string on
 	why the partition is invalid is included within parentheses.
 
-	For a partition root to become valid, the following conditions
+	For a local partition root to be valid, the following conditions
 	must be met.
 
-	1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
-	   are not shared by any of its siblings (exclusivity rule).
-	2) The parent cgroup is a valid partition root.
-	3) The "cpuset.cpus" is not empty and must contain at least
-	   one of the CPUs from parent's "cpuset.cpus", i.e. they overlap.
+	1) The parent cgroup is a valid partition root.
+	2) The "cpuset.cpus.exclusive" is exclusive with its siblings ,
+	   i.e. they are not shared by any of its siblings (exclusivity
+	   rule).
+	3) The "cpuset.cpus.exclusive" is not empty, but it may contain
+	   offline CPUs.
 	4) The "cpuset.cpus.effective" cannot be empty unless there is
 	   no task associated with this partition.
 
-	External events like hotplug or changes to "cpuset.cpus" can
-	cause a valid partition root to become invalid and vice versa.
-	Note that a task cannot be moved to a cgroup with empty
-	"cpuset.cpus.effective".
+	For a remote partition root to be valid, all the above conditions
+	except the first one must be met.
+
+	External events like hotplug or changes to "cpuset.cpus" or
+	"cpuset.cpus.exclusive" can cause a valid partition root to
+	become invalid and vice versa.	Note that a task cannot be
+	moved to a cgroup with empty "cpuset.cpus.effective".
 
 	For a valid partition root with the sibling cpu exclusivity
 	rule enabled, changes made to "cpuset.cpus" that violate the
 	exclusivity rule will invalidate the partition as well as its
 	sibling partitions with conflicting cpuset.cpus values. So
-	care must be taking in changing "cpuset.cpus".
+	care must be taking in changing "cpuset.cpus".	Changes to
+	"cpuset.cpus.exclusive" that violates the exclusivity rule will
+	not be allowed.
 
 	A valid non-root parent partition may distribute out all its CPUs
-	to its child partitions when there is no task associated with it.
+	to its child local partitions when there is no task associated
+	with it.
 
-	Care must be taken to change a valid partition root to
-	"member" as all its child partitions, if present, will become
+	Care must be taken to change a valid partition root to "member"
+	as all its child local partitions, if present, will become
 	invalid causing disruption to tasks running in those child
 	partitions. These inactivated partitions could be recovered if
 	their parent is switched back to a partition root with a proper
@@ -2310,6 +2353,11 @@  Cpuset Interface Files
 	to "cpuset.cpus.partition" without the need to do continuous
 	polling.
 
+	A user can pre-configure certain CPUs to an isolated state at
+	boot time with the "isolcpus" kernel boot command line option.
+	If those CPUs are to be put into a partition, they have to
+	be used in an isolated partition.
+
 
 Device controller
 -----------------