Message ID | 20240723181025.187413-1-longman@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 04225c21b25698dd73b76a684289ad951cb27e15 |
Headers | show |
Series | rcu: Use system_unbound_wq to avoid disturbing isolated CPUs | expand |
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote: > It was discovered that isolated CPUs could sometimes be disturbed by > kworkers processing kfree_rcu() works causing higher than expected > latency. It is because the RCU core uses "system_wq" which doesn't have > the WQ_UNBOUND flag to handle all its work items. Fix this violation of > latency limits by using "system_unbound_wq" in the RCU core instead. > This will ensure that those work items will not be run on CPUs marked > as isolated. > > Beside the WQ_UNBOUND flag, the other major difference between system_wq > and system_unbound_wq is their max_active count. The system_unbound_wq > has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active > is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE. > > Reported-by: Vratislav Bendel <vbendel@redhat.com> > Closes: https://issues.redhat.com/browse/RHEL-50220 > Signed-off-by: Waiman Long <longman@redhat.com> > --- > kernel/rcu/tasks.h | 4 ++-- > kernel/rcu/tree.c | 8 ++++---- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index e641cc681901..494aa9513d0b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3539,10 +3539,10 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) > if (delayed_work_pending(&krcp->monitor_work)) { > delay_left = krcp->monitor_work.timer.expires - jiffies; > if (delay < delay_left) > - mod_delayed_work(system_wq, &krcp->monitor_work, delay); > + mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); > return; > } > - queue_delayed_work(system_wq, &krcp->monitor_work, delay); > + queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); > } > > static void > @@ -3634,7 +3634,7 @@ static void kfree_rcu_monitor(struct work_struct *work) > // be that the work is in the pending state when > // channels have been detached following by each > // other. > - queue_rcu_work(system_wq, &krwp->rcu_work); > + queue_rcu_work(system_unbound_wq, &krwp->rcu_work); > } > } > > @@ -3704,7 +3704,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp) > if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && > !atomic_xchg(&krcp->work_in_progress, 1)) { > if (atomic_read(&krcp->backoff_page_cache_fill)) { > - queue_delayed_work(system_wq, > + queue_delayed_work(system_unbound_wq, > &krcp->page_cache_work, > msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); > } else { > -- > 2.43.5 > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Thanks! -- Uladzislau Rezki
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote: > It was discovered that isolated CPUs could sometimes be disturbed by > kworkers processing kfree_rcu() works causing higher than expected > latency. It is because the RCU core uses "system_wq" which doesn't have > the WQ_UNBOUND flag to handle all its work items. Fix this violation of > latency limits by using "system_unbound_wq" in the RCU core instead. > This will ensure that those work items will not be run on CPUs marked > as isolated. > > Beside the WQ_UNBOUND flag, the other major difference between system_wq > and system_unbound_wq is their max_active count. The system_unbound_wq > has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active > is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE. > > Reported-by: Vratislav Bendel <vbendel@redhat.com> I've seen this problem a while ago and reported to the list: https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/ I've just applied this test, and run my workload for 2 hours without hitting this issue. Thanks for solving it. Tested-by: Breno Leitao <leitao@debian.org>
On Wed, Jul 24, 2024 at 06:30:29AM -0700, Breno Leitao wrote: > On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote: > > It was discovered that isolated CPUs could sometimes be disturbed by > > kworkers processing kfree_rcu() works causing higher than expected > > latency. It is because the RCU core uses "system_wq" which doesn't have > > the WQ_UNBOUND flag to handle all its work items. Fix this violation of > > latency limits by using "system_unbound_wq" in the RCU core instead. > > This will ensure that those work items will not be run on CPUs marked > > as isolated. > > > > Beside the WQ_UNBOUND flag, the other major difference between system_wq > > and system_unbound_wq is their max_active count. The system_unbound_wq > > has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active > > is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE. > > > > Reported-by: Vratislav Bendel <vbendel@redhat.com> > > I've seen this problem a while ago and reported to the list: > > https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/ > > I've just applied this test, and run my workload for 2 hours without > hitting this issue. Thanks for solving it. > > Tested-by: Breno Leitao <leitao@debian.org> > Thank you for testing this! I saw your recent email about that :) -- Uladzislau Rezki
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote: > It was discovered that isolated CPUs could sometimes be disturbed by > kworkers processing kfree_rcu() works causing higher than expected > latency. It is because the RCU core uses "system_wq" which doesn't have > the WQ_UNBOUND flag to handle all its work items. Fix this violation of > latency limits by using "system_unbound_wq" in the RCU core instead. > This will ensure that those work items will not be run on CPUs marked > as isolated. > Alternative approach here could be, in case we want to keep per CPU worker pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch is fixing? - Neeraj > Beside the WQ_UNBOUND flag, the other major difference between system_wq > and system_unbound_wq is their max_active count. The system_unbound_wq > has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active > is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE. > > Reported-by: Vratislav Bendel <vbendel@redhat.com> > Closes: https://issues.redhat.com/browse/RHEL-50220 > Signed-off-by: Waiman Long <longman@redhat.com> > --- > kernel/rcu/tasks.h | 4 ++-- > kernel/rcu/tree.c | 8 ++++---- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index e641cc681901..494aa9513d0b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3539,10 +3539,10 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) > if (delayed_work_pending(&krcp->monitor_work)) { > delay_left = krcp->monitor_work.timer.expires - jiffies; > if (delay < delay_left) > - mod_delayed_work(system_wq, &krcp->monitor_work, delay); > + mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); > return; > } > - queue_delayed_work(system_wq, &krcp->monitor_work, delay); > + queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); > } > > static void > @@ -3634,7 +3634,7 @@ static void kfree_rcu_monitor(struct work_struct *work) > // be that the work is in the pending state when > // channels have been detached following by each > // other. > - queue_rcu_work(system_wq, &krwp->rcu_work); > + queue_rcu_work(system_unbound_wq, &krwp->rcu_work); > } > } > > @@ -3704,7 +3704,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp) > if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && > !atomic_xchg(&krcp->work_in_progress, 1)) { > if (atomic_read(&krcp->backoff_page_cache_fill)) { > - queue_delayed_work(system_wq, > + queue_delayed_work(system_unbound_wq, > &krcp->page_cache_work, > msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); > } else { > -- > 2.43.5 >
On 7/25/24 11:35, Neeraj Upadhyay wrote: > On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote: >> It was discovered that isolated CPUs could sometimes be disturbed by >> kworkers processing kfree_rcu() works causing higher than expected >> latency. It is because the RCU core uses "system_wq" which doesn't have >> the WQ_UNBOUND flag to handle all its work items. Fix this violation of >> latency limits by using "system_unbound_wq" in the RCU core instead. >> This will ensure that those work items will not be run on CPUs marked >> as isolated. >> > Alternative approach here could be, in case we want to keep per CPU worker > pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where > WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch > is fixing? What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or what will we lose by using system_unbound_wq? All the calls that are modified to use system_unbound_wq are using WORK_CPU_UNBOUND as their cpu. IOW, they doesn't care which CPUs are used to run the work items. The only downside I can see is the possible loss of some cache locality. In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a WQ_CPU_INTENSIVE workqueue. Cheers, Longman
On Thu, Jul 25, 2024 at 01:02:01PM -0400, Waiman Long wrote: > On 7/25/24 11:35, Neeraj Upadhyay wrote: > > On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote: > > > It was discovered that isolated CPUs could sometimes be disturbed by > > > kworkers processing kfree_rcu() works causing higher than expected > > > latency. It is because the RCU core uses "system_wq" which doesn't have > > > the WQ_UNBOUND flag to handle all its work items. Fix this violation of > > > latency limits by using "system_unbound_wq" in the RCU core instead. > > > This will ensure that those work items will not be run on CPUs marked > > > as isolated. > > > > > Alternative approach here could be, in case we want to keep per CPU worker > > pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where > > WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch > > is fixing? > > What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or what > will we lose by using system_unbound_wq? All the calls that are modified to > use system_unbound_wq are using WORK_CPU_UNBOUND as their cpu. IOW, they > doesn't care which CPUs are used to run the work items. The only downside I > can see is the possible loss of some cache locality. > For the nohz_full case, where unbounded pool workers run only on housekeeping CPU (cpu0), if multiple other CPUs are queuing work, the execution of those works could get delayed. However, this should not generally happen as other CPUs would be mostly running in user mode. > In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An > WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a > WQ_CPU_INTENSIVE workqueue. Got it, thanks! I have picked the patch for further review and testing [1] [1] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next - Neeraj > > Cheers, > Longman > >
On 7/25/24 15:33, Neeraj Upadhyay wrote: > On Thu, Jul 25, 2024 at 01:02:01PM -0400, Waiman Long wrote: >> On 7/25/24 11:35, Neeraj Upadhyay wrote: >>> On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote: >>>> It was discovered that isolated CPUs could sometimes be disturbed by >>>> kworkers processing kfree_rcu() works causing higher than expected >>>> latency. It is because the RCU core uses "system_wq" which doesn't have >>>> the WQ_UNBOUND flag to handle all its work items. Fix this violation of >>>> latency limits by using "system_unbound_wq" in the RCU core instead. >>>> This will ensure that those work items will not be run on CPUs marked >>>> as isolated. >>>> >>> Alternative approach here could be, in case we want to keep per CPU worker >>> pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where >>> WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch >>> is fixing? >> What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or what >> will we lose by using system_unbound_wq? All the calls that are modified to >> use system_unbound_wq are using WORK_CPU_UNBOUND as their cpu. IOW, they >> doesn't care which CPUs are used to run the work items. The only downside I >> can see is the possible loss of some cache locality. >> > For the nohz_full case, where unbounded pool workers run only on housekeeping CPU > (cpu0), if multiple other CPUs are queuing work, the execution of those > works could get delayed. However, this should not generally happen as > other CPUs would be mostly running in user mode. Well, it there is only one housekeeping CPU, a lot of background kernel tasks will be slowed down. Users should be careful about the proper balance between the number of housekeeping and nohz-full CPUs. > > >> In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An >> WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a >> WQ_CPU_INTENSIVE workqueue. > Got it, thanks! > > I have picked the patch for further review and testing [1] > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next Thanks, let me know if you see any problem. Cheers, Longman
On 7/24/24 09:30, Breno Leitao wrote: > On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote: >> It was discovered that isolated CPUs could sometimes be disturbed by >> kworkers processing kfree_rcu() works causing higher than expected >> latency. It is because the RCU core uses "system_wq" which doesn't have >> the WQ_UNBOUND flag to handle all its work items. Fix this violation of >> latency limits by using "system_unbound_wq" in the RCU core instead. >> This will ensure that those work items will not be run on CPUs marked >> as isolated. >> >> Beside the WQ_UNBOUND flag, the other major difference between system_wq >> and system_unbound_wq is their max_active count. The system_unbound_wq >> has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active >> is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE. >> >> Reported-by: Vratislav Bendel <vbendel@redhat.com> > I've seen this problem a while ago and reported to the list: > > https://lore.kernel.org/all/Zp906X7VJGNKl5fW@gmail.com/ > > I've just applied this test, and run my workload for 2 hours without > hitting this issue. Thanks for solving it. > > Tested-by: Breno Leitao <leitao@debian.org> Thank for testing this patch. So it is just us that saw this problem. Cheers, Longman
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index e641cc681901..494aa9513d0b 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3539,10 +3539,10 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) if (delayed_work_pending(&krcp->monitor_work)) { delay_left = krcp->monitor_work.timer.expires - jiffies; if (delay < delay_left) - mod_delayed_work(system_wq, &krcp->monitor_work, delay); + mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); return; } - queue_delayed_work(system_wq, &krcp->monitor_work, delay); + queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); } static void @@ -3634,7 +3634,7 @@ static void kfree_rcu_monitor(struct work_struct *work) // be that the work is in the pending state when // channels have been detached following by each // other. - queue_rcu_work(system_wq, &krwp->rcu_work); + queue_rcu_work(system_unbound_wq, &krwp->rcu_work); } } @@ -3704,7 +3704,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp) if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && !atomic_xchg(&krcp->work_in_progress, 1)) { if (atomic_read(&krcp->backoff_page_cache_fill)) { - queue_delayed_work(system_wq, + queue_delayed_work(system_unbound_wq, &krcp->page_cache_work, msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); } else {
It was discovered that isolated CPUs could sometimes be disturbed by kworkers processing kfree_rcu() works causing higher than expected latency. It is because the RCU core uses "system_wq" which doesn't have the WQ_UNBOUND flag to handle all its work items. Fix this violation of latency limits by using "system_unbound_wq" in the RCU core instead. This will ensure that those work items will not be run on CPUs marked as isolated. Beside the WQ_UNBOUND flag, the other major difference between system_wq and system_unbound_wq is their max_active count. The system_unbound_wq has a max_active of WQ_MAX_ACTIVE (512) while system_wq's max_active is WQ_DFL_ACTIVE (256) which is half of WQ_MAX_ACTIVE. Reported-by: Vratislav Bendel <vbendel@redhat.com> Closes: https://issues.redhat.com/browse/RHEL-50220 Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/rcu/tasks.h | 4 ++-- kernel/rcu/tree.c | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-)