Message ID | 20211018143619.205065-1-longman@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus | expand |
On 10/18/21 10:36 AM, Waiman Long wrote: > v8: > - Reorganize the patch series and rationalize the features and > constraints of a partition. > - Update patch descriptions and documentation accordingly. > > v7: > - Simplify the documentation patch (patch 5) as suggested by Tejun. > - Fix a typo in patch 2 and improper commit log in patch 3. > > v6: > - Remove duplicated tmpmask from update_prstate() which should fix the > frame size too large problem reported by kernel test robot. > > This patchset makes four enhancements to the cpuset v2 code. > > Patch 1: Enable partition with no task to have empty cpuset.cpus.effective. > > Patch 2: Refining the features and constraints of a cpuset partition > clarifying what changes are allowed. > > Patch 3: Add a new partition state "isolated" to create a partition > root without load balancing. This is for handling intermitten workloads > that have a strict low latency requirement. > > Patch 4: Enable the "cpuset.cpus.partition" file to show the reason > that causes invalid partition like "root invalid (No cpu available > due to hotplug)". > > Patch 5 updates the cgroup-v2.rst file accordingly. Patch 6 adds a new > cpuset test to test the new cpuset partition code. > > Waiman Long (6): > cgroup/cpuset: Allow no-task partition to have empty > cpuset.cpus.effective > cgroup/cpuset: Refining features and constraints of a partition > cgroup/cpuset: Add a new isolated cpus.partition type > cgroup/cpuset: Show invalid partition reason string > cgroup/cpuset: Update description of cpuset.cpus.partition in > cgroup-v2.rst > kselftest/cgroup: Add cpuset v2 partition root state test > > Documentation/admin-guide/cgroup-v2.rst | 153 ++-- > kernel/cgroup/cpuset.c | 393 +++++++---- > tools/testing/selftests/cgroup/Makefile | 5 +- > .../selftests/cgroup/test_cpuset_prs.sh | 664 ++++++++++++++++++ > tools/testing/selftests/cgroup/wait_inotify.c | 87 +++ > 5 files changed, 1115 insertions(+), 187 deletions(-) > create mode 100755 tools/testing/selftests/cgroup/test_cpuset_prs.sh > create mode 100644 tools/testing/selftests/cgroup/wait_inotify.c Any feedback on this patch series? Thanks, Longman
Hi Weiman, > v8: > - Reorganize the patch series and rationalize the features and > constraints of a partition. > - Update patch descriptions and documentation accordingly. > > v7: > - Simplify the documentation patch (patch 5) as suggested by Tejun. > - Fix a typo in patch 2 and improper commit log in patch 3. > > v6: > - Remove duplicated tmpmask from update_prstate() which should fix the > frame size too large problem reported by kernel test robot. > > This patchset makes four enhancements to the cpuset v2 code. > > Patch 1: Enable partition with no task to have empty cpuset.cpus.effective. > > Patch 2: Refining the features and constraints of a cpuset partition > clarifying what changes are allowed. > > Patch 3: Add a new partition state "isolated" to create a partition > root without load balancing. This is for handling intermitten workloads > that have a strict low latency requirement. I just tested this patch-series and can confirm that it works on 5.15.0-rc7-rt15 (PREEMT_RT). However, I was not able to see any latency improvements when using cpuset.cpus.partition=isolated. The test was performed with jitterdebugger on CPUs 1-3 and the following cmdline: rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable On the other cpus, stress-ng was executed to generate load. Just some more general notes: Even with this new "isolated" type, it is still very tricky to get a similar behavior as with isolcpus (as long as I don't miss something here): Consider an RT application that consists of a non-rt thread that should be floating and a rt-thread that should be placed in the isolated domain. This requires cgroup.type=threaded on both cgroups and changes to the application (threads have to be born in non-rt group and moved to rt-group). Theoretically, this could be done externally, but in case the application sets the affinity mask manually, you run into a timing issue (setting affinities to CPUs outside the current cpuset.cpus results in EINVAL). Best regards, Felix Moessbauer Siemens AG > Patch 4: Enable the "cpuset.cpus.partition" file to show the reason > that causes invalid partition like "root invalid (No cpu available > due to hotplug)". > > Patch 5 updates the cgroup-v2.rst file accordingly. Patch 6 adds a new > cpuset test to test the new cpuset partition code.
On Wed, Nov 10, 2021 at 12:13:57PM +0100, Felix Moessbauer wrote: > Hi Weiman, > > > v8: > > - Reorganize the patch series and rationalize the features and > > constraints of a partition. > > - Update patch descriptions and documentation accordingly. > > > > v7: > > - Simplify the documentation patch (patch 5) as suggested by Tejun. > > - Fix a typo in patch 2 and improper commit log in patch 3. > > > > v6: > > - Remove duplicated tmpmask from update_prstate() which should fix the > > frame size too large problem reported by kernel test robot. > > > > This patchset makes four enhancements to the cpuset v2 code. > > > > Patch 1: Enable partition with no task to have empty cpuset.cpus.effective. > > > > Patch 2: Refining the features and constraints of a cpuset partition > > clarifying what changes are allowed. > > > > Patch 3: Add a new partition state "isolated" to create a partition > > root without load balancing. This is for handling intermitten workloads > > that have a strict low latency requirement. > > > I just tested this patch-series and can confirm that it works on 5.15.0-rc7-rt15 (PREEMT_RT). > > However, I was not able to see any latency improvements when using > cpuset.cpus.partition=isolated. > The test was performed with jitterdebugger on CPUs 1-3 and the following cmdline: > rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable > On the other cpus, stress-ng was executed to generate load. enum hk_flags { HK_FLAG_TIMER = 1, HK_FLAG_RCU = (1 << 1), HK_FLAG_MISC = (1 << 2), HK_FLAG_SCHED = (1 << 3), HK_FLAG_TICK = (1 << 4), HK_FLAG_DOMAIN = (1 << 5), HK_FLAG_WQ = (1 << 6), HK_FLAG_MANAGED_IRQ = (1 << 7), HK_FLAG_KTHREAD = (1 << 8), }; static int __init housekeeping_nohz_full_setup(char *str) { unsigned int flags; flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC | HK_FLAG_KTHREAD; return housekeeping_setup(str, flags); } __setup("nohz_full=", housekeeping_nohz_full_setup); So HK_FLAG_SCHED and HK_FLAG_MANAGED_IRQ are unset in your configuration. Perhaps they are affecting your latency numbers? This tool might be handy to see what is the reason for the latency source: https://github.com/xzpeter/rt-trace-bpf ./rt-trace-bcc.py -c isolated-cpu > Just some more general notes: > > Even with this new "isolated" type, it is still very tricky to get a similar > behavior as with isolcpus (as long as I don't miss something here): > > Consider an RT application that consists of a non-rt thread that should be floating > and a rt-thread that should be placed in the isolated domain. > This requires cgroup.type=threaded on both cgroups and changes to the application > (threads have to be born in non-rt group and moved to rt-group). > > Theoretically, this could be done externally, but in case the application sets the > affinity mask manually, you run into a timing issue (setting affinities to CPUs > outside the current cpuset.cpus results in EINVAL). > > Best regards, > Felix Moessbauer > Siemens AG > > > Patch 4: Enable the "cpuset.cpus.partition" file to show the reason > > that causes invalid partition like "root invalid (No cpu available > > due to hotplug)". > > > > Patch 5 updates the cgroup-v2.rst file accordingly. Patch 6 adds a new > > cpuset test to test the new cpuset partition code. > >
Hello. On Wed, Nov 10, 2021 at 12:13:57PM +0100, Felix Moessbauer <felix.moessbauer@siemens.com> wrote: > However, I was not able to see any latency improvements when using > cpuset.cpus.partition=isolated. Interesting. What was the baseline against which you compared it (isolcpus, no cpusets,...)? > The test was performed with jitterdebugger on CPUs 1-3 and the following cmdline: > rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable > On the other cpus, stress-ng was executed to generate load. > [...] > This requires cgroup.type=threaded on both cgroups and changes to the application > (threads have to be born in non-rt group and moved to rt-group). But even with isolcpus the application would need to set affinity of threads to the selected CPUs (cf cgroup migrating). Do I miss anything? Thanks, Michal
On 11/10/21 06:13, Felix Moessbauer wrote: > Hi Weiman, > >> v8: >> - Reorganize the patch series and rationalize the features and >> constraints of a partition. >> - Update patch descriptions and documentation accordingly. >> >> v7: >> - Simplify the documentation patch (patch 5) as suggested by Tejun. >> - Fix a typo in patch 2 and improper commit log in patch 3. >> >> v6: >> - Remove duplicated tmpmask from update_prstate() which should fix the >> frame size too large problem reported by kernel test robot. >> >> This patchset makes four enhancements to the cpuset v2 code. >> >> Patch 1: Enable partition with no task to have empty cpuset.cpus.effective. >> >> Patch 2: Refining the features and constraints of a cpuset partition >> clarifying what changes are allowed. >> >> Patch 3: Add a new partition state "isolated" to create a partition >> root without load balancing. This is for handling intermitten workloads >> that have a strict low latency requirement. > > I just tested this patch-series and can confirm that it works on 5.15.0-rc7-rt15 (PREEMT_RT). > > However, I was not able to see any latency improvements when using > cpuset.cpus.partition=isolated. > The test was performed with jitterdebugger on CPUs 1-3 and the following cmdline: > rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable > On the other cpus, stress-ng was executed to generate load. > > Just some more general notes: > > Even with this new "isolated" type, it is still very tricky to get a similar > behavior as with isolcpus (as long as I don't miss something here): > > Consider an RT application that consists of a non-rt thread that should be floating > and a rt-thread that should be placed in the isolated domain. > This requires cgroup.type=threaded on both cgroups and changes to the application > (threads have to be born in non-rt group and moved to rt-group). > > Theoretically, this could be done externally, but in case the application sets the > affinity mask manually, you run into a timing issue (setting affinities to CPUs > outside the current cpuset.cpus results in EINVAL). I believe the "isolated" type will have more benefit on non PREEMPT_RT kernel. Anyway, having the "isolated" type is just the first step. It should be equivalent to "isolcpus=domain". There are other patches floating that attempt to move some of the isolcpus=nohz features into cpuset as well. It is not there yet, but we should be able to have better dynamic cpu isolation down the road. Cheers, Longman
> -----Original Message----- > From: Michal Koutný <mkoutny@suse.com> > Sent: Wednesday, November 10, 2021 2:57 PM > To: Moessbauer, Felix (T RDA IOT SES-DE) <felix.moessbauer@siemens.com> > Cc: longman@redhat.com; akpm@linux-foundation.org; > cgroups@vger.kernel.org; corbet@lwn.net; frederic@kernel.org; guro@fb.com; > hannes@cmpxchg.org; juri.lelli@redhat.com; linux-doc@vger.kernel.org; linux- > kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; > lizefan.x@bytedance.com; mtosatti@redhat.com; pauld@redhat.com; > peterz@infradead.org; shuah@kernel.org; tj@kernel.org; Kiszka, Jan (T RDA > IOT) <jan.kiszka@siemens.com>; Schild, Henning (T RDA IOT SES-DE) > <henning.schild@siemens.com> > Subject: Re: [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & > empty effecitve cpus > > Hello. > > On Wed, Nov 10, 2021 at 12:13:57PM +0100, Felix Moessbauer > <felix.moessbauer@siemens.com> wrote: > > However, I was not able to see any latency improvements when using > > cpuset.cpus.partition=isolated. > > Interesting. What was the baseline against which you compared it (isolcpus, no > cpusets,...)? For this test, I just compared both settings cpuset.cpus.partition=isolated|root. There, I did not see a significant difference (but I know, RT tuning depends on a ton of things). > > > The test was performed with jitterdebugger on CPUs 1-3 and the following > cmdline: > > rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable > > On the other cpus, stress-ng was executed to generate load. > > [...] > > > This requires cgroup.type=threaded on both cgroups and changes to the > > application (threads have to be born in non-rt group and moved to rt-group). > > But even with isolcpus the application would need to set affinity of threads to > the selected CPUs (cf cgroup migrating). Do I miss anything? Yes, that's true. But there are two differences (given that you use isolcpus): 1. the application only has to set the affinity for rt threads. Threads that do not explicitly set the affinity are automatically excluded from the isolated cores. Even common rt test applications like jitterdebugger do not pin their non-rt threads. 2. Threads can be started on non-rt CPUs and then bound to a specific rt CPU. This binding can be specified before thread creation via pthread_create. By that, you can make sure that at no point in time a thread has a "forbidden" CPU in its affinities. With cgroup2, you cannot guarantee the second aspect, as thread creation and moving to a cgroup is not an atomic operation. Also - please correct me if I'm wrong - you first have to create a thread before moving it into a group. At creation time, you cannot set the final affinity mask (as you create it in the non-rt group and there the CPU is not in the cpuset.cpus). Once you move the thread to the rt cgroup, it has a default mask and by that can be executed on other rt cores. Best regards, Felix > > Thanks, > Michal
On Wed, Nov 10, 2021 at 03:21:54PM +0000, Moessbauer, Felix wrote: > > > > -----Original Message----- > > From: Michal Koutný <mkoutny@suse.com> > > Sent: Wednesday, November 10, 2021 2:57 PM > > To: Moessbauer, Felix (T RDA IOT SES-DE) <felix.moessbauer@siemens.com> > > Cc: longman@redhat.com; akpm@linux-foundation.org; > > cgroups@vger.kernel.org; corbet@lwn.net; frederic@kernel.org; guro@fb.com; > > hannes@cmpxchg.org; juri.lelli@redhat.com; linux-doc@vger.kernel.org; linux- > > kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; > > lizefan.x@bytedance.com; mtosatti@redhat.com; pauld@redhat.com; > > peterz@infradead.org; shuah@kernel.org; tj@kernel.org; Kiszka, Jan (T RDA > > IOT) <jan.kiszka@siemens.com>; Schild, Henning (T RDA IOT SES-DE) > > <henning.schild@siemens.com> > > Subject: Re: [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & > > empty effecitve cpus > > > > Hello. > > > > On Wed, Nov 10, 2021 at 12:13:57PM +0100, Felix Moessbauer > > <felix.moessbauer@siemens.com> wrote: > > > However, I was not able to see any latency improvements when using > > > cpuset.cpus.partition=isolated. > > > > Interesting. What was the baseline against which you compared it (isolcpus, no > > cpusets,...)? > > For this test, I just compared both settings cpuset.cpus.partition=isolated|root. > There, I did not see a significant difference (but I know, RT tuning depends on a ton of things). > > > > > > The test was performed with jitterdebugger on CPUs 1-3 and the following > > cmdline: > > > rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable > > > On the other cpus, stress-ng was executed to generate load. > > > [...] > > > > > This requires cgroup.type=threaded on both cgroups and changes to the > > > application (threads have to be born in non-rt group and moved to rt-group). > > > > But even with isolcpus the application would need to set affinity of threads to > > the selected CPUs (cf cgroup migrating). Do I miss anything? > > Yes, that's true. But there are two differences (given that you use isolcpus): > 1. the application only has to set the affinity for rt threads. > Threads that do not explicitly set the affinity are automatically excluded from the isolated cores. > Even common rt test applications like jitterdebugger do not pin their non-rt threads. > 2. Threads can be started on non-rt CPUs and then bound to a specific rt CPU. > This binding can be specified before thread creation via pthread_create. > By that, you can make sure that at no point in time a thread has a "forbidden" CPU in its affinities. > > With cgroup2, you cannot guarantee the second aspect, as thread creation and moving to a cgroup is not an atomic operation. > Also - please correct me if I'm wrong - you first have to create a thread before moving it into a group. > At creation time, you cannot set the final affinity mask (as you create it in the non-rt group and there the CPU is not in the cpuset.cpus). > Once you move the thread to the rt cgroup, it has a default mask and by that can be executed on other rt cores. man clone3: CLONE_NEWCGROUP (since Linux 4.6) Create the process in a new cgroup namespace. If this flag is not set, then (as with fork(2)) the process is created in the same cgroup namespaces as the calling process. For further information on cgroup namespaces, see cgroup_namespaces(7). Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWCGROUP.
On Wed, Nov 10, 2021 at 01:10:20PM -0300, Marcelo Tosatti wrote: > On Wed, Nov 10, 2021 at 03:21:54PM +0000, Moessbauer, Felix wrote: > > > > > > > -----Original Message----- > > > From: Michal Koutný <mkoutny@suse.com> > > > Sent: Wednesday, November 10, 2021 2:57 PM > > > To: Moessbauer, Felix (T RDA IOT SES-DE) <felix.moessbauer@siemens.com> > > > Cc: longman@redhat.com; akpm@linux-foundation.org; > > > cgroups@vger.kernel.org; corbet@lwn.net; frederic@kernel.org; guro@fb.com; > > > hannes@cmpxchg.org; juri.lelli@redhat.com; linux-doc@vger.kernel.org; linux- > > > kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; > > > lizefan.x@bytedance.com; mtosatti@redhat.com; pauld@redhat.com; > > > peterz@infradead.org; shuah@kernel.org; tj@kernel.org; Kiszka, Jan (T RDA > > > IOT) <jan.kiszka@siemens.com>; Schild, Henning (T RDA IOT SES-DE) > > > <henning.schild@siemens.com> > > > Subject: Re: [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & > > > empty effecitve cpus > > > > > > Hello. > > > > > > On Wed, Nov 10, 2021 at 12:13:57PM +0100, Felix Moessbauer > > > <felix.moessbauer@siemens.com> wrote: > > > > However, I was not able to see any latency improvements when using > > > > cpuset.cpus.partition=isolated. > > > > > > Interesting. What was the baseline against which you compared it (isolcpus, no > > > cpusets,...)? > > > > For this test, I just compared both settings cpuset.cpus.partition=isolated|root. > > There, I did not see a significant difference (but I know, RT tuning depends on a ton of things). > > > > > > > > > The test was performed with jitterdebugger on CPUs 1-3 and the following > > > cmdline: > > > > rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable > > > > On the other cpus, stress-ng was executed to generate load. > > > > [...] > > > > > > > This requires cgroup.type=threaded on both cgroups and changes to the > > > > application (threads have to be born in non-rt group and moved to rt-group). > > > > > > But even with isolcpus the application would need to set affinity of threads to > > > the selected CPUs (cf cgroup migrating). Do I miss anything? > > > > Yes, that's true. But there are two differences (given that you use isolcpus): > > 1. the application only has to set the affinity for rt threads. > > Threads that do not explicitly set the affinity are automatically excluded from the isolated cores. > > Even common rt test applications like jitterdebugger do not pin their non-rt threads. > > 2. Threads can be started on non-rt CPUs and then bound to a specific rt CPU. > > This binding can be specified before thread creation via pthread_create. > > By that, you can make sure that at no point in time a thread has a "forbidden" CPU in its affinities. > > > > With cgroup2, you cannot guarantee the second aspect, as thread creation and moving to a cgroup is not an atomic operation. > > Also - please correct me if I'm wrong - you first have to create a thread before moving it into a group. > > At creation time, you cannot set the final affinity mask (as you create it in the non-rt group and there the CPU is not in the cpuset.cpus). > > Once you move the thread to the rt cgroup, it has a default mask and by that can be executed on other rt cores. > > man clone3: > > CLONE_NEWCGROUP (since Linux 4.6) > Create the process in a new cgroup namespace. If this flag is not set, then (as with fork(2)) the > process is created in the same cgroup namespaces as the calling process. > > For further information on cgroup namespaces, see cgroup_namespaces(7). > > Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWCGROUP. > Err, CLONE_INTO_CGROUP.
On 10.11.21 17:10, Marcelo Tosatti wrote: > On Wed, Nov 10, 2021 at 03:21:54PM +0000, Moessbauer, Felix wrote: >> >> >>> -----Original Message----- >>> From: Michal Koutný <mkoutny@suse.com> >>> Sent: Wednesday, November 10, 2021 2:57 PM >>> To: Moessbauer, Felix (T RDA IOT SES-DE) <felix.moessbauer@siemens.com> >>> Cc: longman@redhat.com; akpm@linux-foundation.org; >>> cgroups@vger.kernel.org; corbet@lwn.net; frederic@kernel.org; guro@fb.com; >>> hannes@cmpxchg.org; juri.lelli@redhat.com; linux-doc@vger.kernel.org; linux- >>> kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; >>> lizefan.x@bytedance.com; mtosatti@redhat.com; pauld@redhat.com; >>> peterz@infradead.org; shuah@kernel.org; tj@kernel.org; Kiszka, Jan (T RDA >>> IOT) <jan.kiszka@siemens.com>; Schild, Henning (T RDA IOT SES-DE) >>> <henning.schild@siemens.com> >>> Subject: Re: [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & >>> empty effecitve cpus >>> >>> Hello. >>> >>> On Wed, Nov 10, 2021 at 12:13:57PM +0100, Felix Moessbauer >>> <felix.moessbauer@siemens.com> wrote: >>>> However, I was not able to see any latency improvements when using >>>> cpuset.cpus.partition=isolated. >>> >>> Interesting. What was the baseline against which you compared it (isolcpus, no >>> cpusets,...)? >> >> For this test, I just compared both settings cpuset.cpus.partition=isolated|root. >> There, I did not see a significant difference (but I know, RT tuning depends on a ton of things). >> >>> >>>> The test was performed with jitterdebugger on CPUs 1-3 and the following >>> cmdline: >>>> rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable >>>> On the other cpus, stress-ng was executed to generate load. >>>> [...] >>> >>>> This requires cgroup.type=threaded on both cgroups and changes to the >>>> application (threads have to be born in non-rt group and moved to rt-group). >>> >>> But even with isolcpus the application would need to set affinity of threads to >>> the selected CPUs (cf cgroup migrating). Do I miss anything? >> >> Yes, that's true. But there are two differences (given that you use isolcpus): >> 1. the application only has to set the affinity for rt threads. >> Threads that do not explicitly set the affinity are automatically excluded from the isolated cores. >> Even common rt test applications like jitterdebugger do not pin their non-rt threads. >> 2. Threads can be started on non-rt CPUs and then bound to a specific rt CPU. >> This binding can be specified before thread creation via pthread_create. >> By that, you can make sure that at no point in time a thread has a "forbidden" CPU in its affinities. >> >> With cgroup2, you cannot guarantee the second aspect, as thread creation and moving to a cgroup is not an atomic operation. >> Also - please correct me if I'm wrong - you first have to create a thread before moving it into a group. >> At creation time, you cannot set the final affinity mask (as you create it in the non-rt group and there the CPU is not in the cpuset.cpus). >> Once you move the thread to the rt cgroup, it has a default mask and by that can be executed on other rt cores. > > man clone3: > > CLONE_NEWCGROUP (since Linux 4.6) > Create the process in a new cgroup namespace. If this flag is not set, then (as with fork(2)) the > process is created in the same cgroup namespaces as the calling process. > > For further information on cgroup namespaces, see cgroup_namespaces(7). > > Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWCGROUP. > Is there pthread_attr_setcgroup_np()? Jan
On Wed, Nov 10, 2021 at 05:15:41PM +0100, Jan Kiszka wrote: > On 10.11.21 17:10, Marcelo Tosatti wrote: > > On Wed, Nov 10, 2021 at 03:21:54PM +0000, Moessbauer, Felix wrote: > >> > >> > >>> -----Original Message----- > >>> From: Michal Koutný <mkoutny@suse.com> > >>> Sent: Wednesday, November 10, 2021 2:57 PM > >>> To: Moessbauer, Felix (T RDA IOT SES-DE) <felix.moessbauer@siemens.com> > >>> Cc: longman@redhat.com; akpm@linux-foundation.org; > >>> cgroups@vger.kernel.org; corbet@lwn.net; frederic@kernel.org; guro@fb.com; > >>> hannes@cmpxchg.org; juri.lelli@redhat.com; linux-doc@vger.kernel.org; linux- > >>> kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; > >>> lizefan.x@bytedance.com; mtosatti@redhat.com; pauld@redhat.com; > >>> peterz@infradead.org; shuah@kernel.org; tj@kernel.org; Kiszka, Jan (T RDA > >>> IOT) <jan.kiszka@siemens.com>; Schild, Henning (T RDA IOT SES-DE) > >>> <henning.schild@siemens.com> > >>> Subject: Re: [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & > >>> empty effecitve cpus > >>> > >>> Hello. > >>> > >>> On Wed, Nov 10, 2021 at 12:13:57PM +0100, Felix Moessbauer > >>> <felix.moessbauer@siemens.com> wrote: > >>>> However, I was not able to see any latency improvements when using > >>>> cpuset.cpus.partition=isolated. > >>> > >>> Interesting. What was the baseline against which you compared it (isolcpus, no > >>> cpusets,...)? > >> > >> For this test, I just compared both settings cpuset.cpus.partition=isolated|root. > >> There, I did not see a significant difference (but I know, RT tuning depends on a ton of things). > >> > >>> > >>>> The test was performed with jitterdebugger on CPUs 1-3 and the following > >>> cmdline: > >>>> rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable > >>>> On the other cpus, stress-ng was executed to generate load. > >>>> [...] > >>> > >>>> This requires cgroup.type=threaded on both cgroups and changes to the > >>>> application (threads have to be born in non-rt group and moved to rt-group). > >>> > >>> But even with isolcpus the application would need to set affinity of threads to > >>> the selected CPUs (cf cgroup migrating). Do I miss anything? > >> > >> Yes, that's true. But there are two differences (given that you use isolcpus): > >> 1. the application only has to set the affinity for rt threads. > >> Threads that do not explicitly set the affinity are automatically excluded from the isolated cores. > >> Even common rt test applications like jitterdebugger do not pin their non-rt threads. > >> 2. Threads can be started on non-rt CPUs and then bound to a specific rt CPU. > >> This binding can be specified before thread creation via pthread_create. > >> By that, you can make sure that at no point in time a thread has a "forbidden" CPU in its affinities. > >> > >> With cgroup2, you cannot guarantee the second aspect, as thread creation and moving to a cgroup is not an atomic operation. > >> Also - please correct me if I'm wrong - you first have to create a thread before moving it into a group. > >> At creation time, you cannot set the final affinity mask (as you create it in the non-rt group and there the CPU is not in the cpuset.cpus). > >> Once you move the thread to the rt cgroup, it has a default mask and by that can be executed on other rt cores. > > > > man clone3: > > > > CLONE_NEWCGROUP (since Linux 4.6) > > Create the process in a new cgroup namespace. If this flag is not set, then (as with fork(2)) the > > process is created in the same cgroup namespaces as the calling process. > > > > For further information on cgroup namespaces, see cgroup_namespaces(7). > > > > Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWCGROUP. > > > > Is there pthread_attr_setcgroup_np()? > > Jan Don't know... Waiman?
On Wed, Nov 10, 2021 at 05:15:41PM +0100, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> Is there pthread_attr_setcgroup_np()?
If I'm not mistaken the 'p' in pthreads stands for POSIX and cgroups are
Linux specific so you won't find that (unless you implement that
yourself). ¯\_(ツ)_/¯
Michal
On 10.11.21 18:52, Michal Koutný wrote: > On Wed, Nov 10, 2021 at 05:15:41PM +0100, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> Is there pthread_attr_setcgroup_np()? > > If I'm not mistaken the 'p' in pthreads stands for POSIX and cgroups are > Linux specific so you won't find that (unless you implement that > yourself). ¯\_(ツ)_/¯ > I know what it stands for :). But I don't want to re-implement pthreads just to have a single creation-time configurable injected. Neither would developer of standard application, e.g. libvirt for the rt-kvm special case while most of their use cases are fine with regular pthread APIs. I think there is also a demand for a programming model that fits into existing ones. Jan
On Wed, Nov 10, 2021 at 03:21:54PM +0000, "Moessbauer, Felix" <felix.moessbauer@siemens.com> wrote: > 2. Threads can be started on non-rt CPUs and then bound to a specific rt CPU. > This binding can be specified before thread creation via pthread_create. > By that, you can make sure that at no point in time a thread has a > "forbidden" CPU in its affinities. It should boil down to some clone$version(2) and sched_setaffinity(2) calls, so strictly speaking even with pthread_create(3) the thread is shortly running with the parent's affinity. > With cgroup2, you cannot guarantee the second aspect, as thread > creation and moving to a cgroup is not an atomic operation. As suggested by others, CLONE_INTO_CGROUP (into cpuset cgroup) can actually "hide" the migration into the clone3() call. > At creation time, you cannot set the final affinity mask (as you > create it in the non-rt group and there the CPU is not in the > cpuset.cpus). > Once you move the thread to the rt cgroup, it has a default mask and > by that can be executed on other rt cores. Good point. Perhaps you could work this around by having another level of (non-root partition) cpuset cgroups for individual CPUs? (Maybe there's more clever approach, this is just first to come into my mind.) Michal
On 11/10/21 12:29, Marcelo Tosatti wrote: > On Wed, Nov 10, 2021 at 05:15:41PM +0100, Jan Kiszka wrote: >> On 10.11.21 17:10, Marcelo Tosatti wrote: >>> On Wed, Nov 10, 2021 at 03:21:54PM +0000, Moessbauer, Felix wrote: >>>> >>>>> -----Original Message----- >>>>> From: Michal Koutný <mkoutny@suse.com> >>>>> Sent: Wednesday, November 10, 2021 2:57 PM >>>>> To: Moessbauer, Felix (T RDA IOT SES-DE) <felix.moessbauer@siemens.com> >>>>> Cc: longman@redhat.com; akpm@linux-foundation.org; >>>>> cgroups@vger.kernel.org; corbet@lwn.net; frederic@kernel.org; guro@fb.com; >>>>> hannes@cmpxchg.org; juri.lelli@redhat.com; linux-doc@vger.kernel.org; linux- >>>>> kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; >>>>> lizefan.x@bytedance.com; mtosatti@redhat.com; pauld@redhat.com; >>>>> peterz@infradead.org; shuah@kernel.org; tj@kernel.org; Kiszka, Jan (T RDA >>>>> IOT) <jan.kiszka@siemens.com>; Schild, Henning (T RDA IOT SES-DE) >>>>> <henning.schild@siemens.com> >>>>> Subject: Re: [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & >>>>> empty effecitve cpus >>>>> >>>>> Hello. >>>>> >>>>> On Wed, Nov 10, 2021 at 12:13:57PM +0100, Felix Moessbauer >>>>> <felix.moessbauer@siemens.com> wrote: >>>>>> However, I was not able to see any latency improvements when using >>>>>> cpuset.cpus.partition=isolated. >>>>> Interesting. What was the baseline against which you compared it (isolcpus, no >>>>> cpusets,...)? >>>> For this test, I just compared both settings cpuset.cpus.partition=isolated|root. >>>> There, I did not see a significant difference (but I know, RT tuning depends on a ton of things). >>>> >>>>>> The test was performed with jitterdebugger on CPUs 1-3 and the following >>>>> cmdline: >>>>>> rcu_nocbs=1-4 nohz_full=1-4 irqaffinity=0,5-6,11 intel_pstate=disable >>>>>> On the other cpus, stress-ng was executed to generate load. >>>>>> [...] >>>>>> This requires cgroup.type=threaded on both cgroups and changes to the >>>>>> application (threads have to be born in non-rt group and moved to rt-group). >>>>> But even with isolcpus the application would need to set affinity of threads to >>>>> the selected CPUs (cf cgroup migrating). Do I miss anything? >>>> Yes, that's true. But there are two differences (given that you use isolcpus): >>>> 1. the application only has to set the affinity for rt threads. >>>> Threads that do not explicitly set the affinity are automatically excluded from the isolated cores. >>>> Even common rt test applications like jitterdebugger do not pin their non-rt threads. >>>> 2. Threads can be started on non-rt CPUs and then bound to a specific rt CPU. >>>> This binding can be specified before thread creation via pthread_create. >>>> By that, you can make sure that at no point in time a thread has a "forbidden" CPU in its affinities. >>>> >>>> With cgroup2, you cannot guarantee the second aspect, as thread creation and moving to a cgroup is not an atomic operation. >>>> Also - please correct me if I'm wrong - you first have to create a thread before moving it into a group. >>>> At creation time, you cannot set the final affinity mask (as you create it in the non-rt group and there the CPU is not in the cpuset.cpus). >>>> Once you move the thread to the rt cgroup, it has a default mask and by that can be executed on other rt cores. >>> man clone3: >>> >>> CLONE_NEWCGROUP (since Linux 4.6) >>> Create the process in a new cgroup namespace. If this flag is not set, then (as with fork(2)) the >>> process is created in the same cgroup namespaces as the calling process. >>> >>> For further information on cgroup namespaces, see cgroup_namespaces(7). >>> >>> Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWCGROUP. >>> >> Is there pthread_attr_setcgroup_np()? >> >> Jan > Don't know... Waiman? I don't think there is such libpthread call yet. -Longman