Message ID | 20190605133650.28545-1-daniel.m.jordan@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | cgroup-aware unbound workqueues | expand |
Hello, Daniel. On Wed, Jun 05, 2019 at 09:36:45AM -0400, Daniel Jordan wrote: > My use case for this work is kernel multithreading, the series formerly known > as ktask[2] that I'm now trying to combine with padata according to feedback > from the last post. Helper threads in a multithreaded job may consume lots of > resources that aren't properly accounted to the cgroup of the task that started > the job. Can you please go into more details on the use cases? For memory and io, we're generally going for remote charging, where a kthread explicitly says who the specific io or allocation is for, combined with selective back-charging, where the resource is charged and consumed unconditionally even if that would put the usage above the current limits temporarily. From what I've been seeing recently, combination of the two give us really good control quality without being too invasive across the stack. CPU doesn't have a backcharging mechanism yet and depending on the use case, we *might* need to put kthreads in different cgroups. However, such use cases might not be that abundant and there may be gotaches which require them to be force-executed and back-charged (e.g. fs compression from global reclaim). Thanks.
Hi Tejun, On Wed, Jun 05, 2019 at 06:53:19AM -0700, Tejun Heo wrote: > On Wed, Jun 05, 2019 at 09:36:45AM -0400, Daniel Jordan wrote: > > My use case for this work is kernel multithreading, the series formerly known > > as ktask[2] that I'm now trying to combine with padata according to feedback > > from the last post. Helper threads in a multithreaded job may consume lots of > > resources that aren't properly accounted to the cgroup of the task that started > > the job. > > Can you please go into more details on the use cases? Sure, quoting from the last ktask post: A single CPU can spend an excessive amount of time in the kernel operating on large amounts of data. Often these situations arise during initialization- and destruction-related tasks, where the data involved scales with system size. These long-running jobs can slow startup and shutdown of applications and the system itself while extra CPUs sit idle. To ensure that applications and the kernel continue to perform well as core counts and memory sizes increase, harness these idle CPUs to complete such jobs more quickly. ktask is a generic framework for parallelizing CPU-intensive work in the kernel. The API is generic enough to add concurrency to many different kinds of tasks--for example, zeroing a range of pages or evicting a list of inodes--and aims to save its clients the trouble of splitting up the work, choosing the number of threads to use, maintaining an efficient concurrency level, starting these threads, and load balancing the work between them. So far the users of the framework primarily consume CPU and memory. > For memory and io, we're generally going for remote charging, where a > kthread explicitly says who the specific io or allocation is for, > combined with selective back-charging, where the resource is charged > and consumed unconditionally even if that would put the usage above > the current limits temporarily. From what I've been seeing recently, > combination of the two give us really good control quality without > being too invasive across the stack. Yes, for memory I actually use remote charging. In patch 3 the worker's current->active_memcg field is changed to match that of the cgroup associated with the work. Cc Shakeel, since we're talking about it. > CPU doesn't have a backcharging mechanism yet and depending on the use > case, we *might* need to put kthreads in different cgroups. However, > such use cases might not be that abundant and there may be gotaches > which require them to be force-executed and back-charged (e.g. fs > compression from global reclaim). The CPU-intensiveness of these works is one of the reasons for actually putting the workers through the migration path. I don't know of a way to get the workers to respect the cpu controller (and even cpuset for that matter) without doing that. Thanks for the quick feedback. Daniel
Hi Tejun, On Wed, Jun 05, 2019 at 06:53:19AM -0700, Tejun Heo wrote: > Hello, Daniel. > > On Wed, Jun 05, 2019 at 09:36:45AM -0400, Daniel Jordan wrote: > > My use case for this work is kernel multithreading, the series formerly known > > as ktask[2] that I'm now trying to combine with padata according to feedback > > from the last post. Helper threads in a multithreaded job may consume lots of > > resources that aren't properly accounted to the cgroup of the task that started > > the job. > > Can you please go into more details on the use cases? If I remember correctly, the original Bandan's work was about using workqueues instead of kthreads in vhost. > For memory and io, we're generally going for remote charging, where a > kthread explicitly says who the specific io or allocation is for, > combined with selective back-charging, where the resource is charged > and consumed unconditionally even if that would put the usage above > the current limits temporarily. From what I've been seeing recently, > combination of the two give us really good control quality without > being too invasive across the stack. > > CPU doesn't have a backcharging mechanism yet and depending on the use > case, we *might* need to put kthreads in different cgroups. However, > such use cases might not be that abundant and there may be gotaches > which require them to be force-executed and back-charged (e.g. fs > compression from global reclaim). > > Thanks. > > -- > tejun >
Hello, On Thu, Jun 06, 2019 at 09:15:26AM +0300, Mike Rapoport wrote: > > Can you please go into more details on the use cases? > > If I remember correctly, the original Bandan's work was about using > workqueues instead of kthreads in vhost. For vhosts, I think it might be better to stick with kthread or kthread_worker given that they can consume lots of cpu cycles over a long period of time and we want to keep persistent track of scheduling states. Thanks.
Hello, Daniel. On Wed, Jun 05, 2019 at 11:32:29AM -0400, Daniel Jordan wrote: > Sure, quoting from the last ktask post: > > A single CPU can spend an excessive amount of time in the kernel operating > on large amounts of data. Often these situations arise during initialization- > and destruction-related tasks, where the data involved scales with system size. > These long-running jobs can slow startup and shutdown of applications and the > system itself while extra CPUs sit idle. > > To ensure that applications and the kernel continue to perform well as core > counts and memory sizes increase, harness these idle CPUs to complete such jobs > more quickly. > > ktask is a generic framework for parallelizing CPU-intensive work in the > kernel. The API is generic enough to add concurrency to many different kinds > of tasks--for example, zeroing a range of pages or evicting a list of > inodes--and aims to save its clients the trouble of splitting up the work, > choosing the number of threads to use, maintaining an efficient concurrency > level, starting these threads, and load balancing the work between them. Yeah, that rings a bell. > > For memory and io, we're generally going for remote charging, where a > > kthread explicitly says who the specific io or allocation is for, > > combined with selective back-charging, where the resource is charged > > and consumed unconditionally even if that would put the usage above > > the current limits temporarily. From what I've been seeing recently, > > combination of the two give us really good control quality without > > being too invasive across the stack. > > Yes, for memory I actually use remote charging. In patch 3 the worker's > current->active_memcg field is changed to match that of the cgroup associated > with the work. I see. > > CPU doesn't have a backcharging mechanism yet and depending on the use > > case, we *might* need to put kthreads in different cgroups. However, > > such use cases might not be that abundant and there may be gotaches > > which require them to be force-executed and back-charged (e.g. fs > > compression from global reclaim). > > The CPU-intensiveness of these works is one of the reasons for actually putting > the workers through the migration path. I don't know of a way to get the > workers to respect the cpu controller (and even cpuset for that matter) without > doing that. So, I still think it'd likely be better to go back-charging route than actually putting kworkers in non-root cgroups. That's gonna be way cheaper, simpler and makes avoiding inadvertent priority inversions trivial. Thanks.
On Tue, Jun 11, 2019 at 12:55:49PM -0700, Tejun Heo wrote: > > > CPU doesn't have a backcharging mechanism yet and depending on the use > > > case, we *might* need to put kthreads in different cgroups. However, > > > such use cases might not be that abundant and there may be gotaches > > > which require them to be force-executed and back-charged (e.g. fs > > > compression from global reclaim). > > > > The CPU-intensiveness of these works is one of the reasons for actually putting > > the workers through the migration path. I don't know of a way to get the > > workers to respect the cpu controller (and even cpuset for that matter) without > > doing that. > > So, I still think it'd likely be better to go back-charging route than > actually putting kworkers in non-root cgroups. That's gonna be way > cheaper, simpler and makes avoiding inadvertent priority inversions > trivial. Ok, I'll experiment with backcharging in the cpu controller. Initial plan is to smooth out resource usage by backcharging after each chunk of work that each helper thread does rather than do one giant backcharge after the multithreaded job is over. May turn out better performance-wise to do it less often than this. I'll also experiment with getting workqueue workers to respect cpuset without migrating. Seems to make sense to use the intersection of an unbound worker's cpumask and the cpuset's cpumask, and make some compromises if the result is empty. Daniel