Message ID | 20220627195353.1575285-1-urezki@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2,1/1] rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval | expand |
On Mon, Jun 27, 2022 at 09:53:53PM +0200, Uladzislau Rezki (Sony) wrote: > Currently the monitor work is scheduled with a fixed interval that > is HZ/20 or each 50 milliseconds. The drawback of such approach is > a low utilization of page slot in some scenarios. The page can store > up to 512 records. For example on Android system it can look like: I was looking at queuing this one, but we need a bit more data. In the meantime, here is my wordsmithing of the above paragraph: Currently the monitor work is scheduled with a fixed interval of HZ/20, which is roughly 50 milliseconds. The drawback of this approach is low utilization of the 512 page slots in scenarios with infrequence kvfree_rcu() calls. For example on an Android system: > <snip> > kworker/3:0-13872 [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1 > kworker/3:0-13872 [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > kworker/1:2-20434 [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > kworker/1:2-20434 [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2 > kworker/0:1-20052 [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2 > kworker/0:1-20052 [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1 > kworker/2:3-14372 [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > kworker/2:3-14372 [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > kworker/1:2-20434 [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5 > kworker/2:3-14372 [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5 > kworker/1:2-20434 [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1 > kworker/2:3-14372 [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6 > kworker/3:0-13872 [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3 > kworker/1:2-20434 [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4 > kworker/1:2-20434 [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4 > <snip> > > where a page only carries few records to reclaim a memory. In order > to improve batching and make utilization more efficient the patch sets > a drain interval to 1 second as default one. When a flood is detected > an interval is adjusted in a way that a reclaim work is re-scheduled > on a next timer jiffy. And of the above paragraph: Out of 512 slots, in all cases, fewer than 10 were actually used. In order to improve batching and make utilization more efficient this commit sets a drain interval to a fixed 1-second interval. Floods are detected when a page fills quickly, and in that case, the reclaim work is re-scheduled for the next scheduling-clock tick (jiffy). --- But what we need now is a trace like the one above showing higher utilization of the pages. Could you please supply this? > - default - > Total time taken by all kfree'ers: 11510245312 ns, loops: 10000, batches: 1553, memory footprint: 70MB > Total time taken by all kfree'ers: 9813329636 ns, loops: 10000, batches: 1544, memory footprint: 67MB > Total time taken by all kfree'ers: 10085206318 ns, loops: 10000, batches: 1499, memory footprint: 156MB > Total time taken by all kfree'ers: 9582207782 ns, loops: 10000, batches: 1456, memory footprint: 103MB > Total time taken by all kfree'ers: 9872195750 ns, loops: 10000, batches: 1519, memory footprint: 105MB > Total time taken by all kfree'ers: 9574438300 ns, loops: 10000, batches: 1480, memory footprint: 101MB > Total time taken by all kfree'ers: 9990015265 ns, loops: 10000, batches: 1489, memory footprint: 127MB > Total time taken by all kfree'ers: 9978971689 ns, loops: 10000, batches: 1455, memory footprint: 94MB > Total time taken by all kfree'ers: 10357628090 ns, loops: 10000, batches: 1456, memory footprint: 64MB > Total time taken by all kfree'ers: 9838469975 ns, loops: 10000, batches: 1448, memory footprint: 131MB > > - patch - > Total time taken by all kfree'ers: 8488575321 ns, loops: 10000, batches: 1735, memory footprint: 82MB > Total time taken by all kfree'ers: 9256401034 ns, loops: 10000, batches: 1762, memory footprint: 76MB > Total time taken by all kfree'ers: 9198011994 ns, loops: 10000, batches: 1751, memory footprint: 82MB > Total time taken by all kfree'ers: 10590540622 ns, loops: 10000, batches: 1540, memory footprint: 94MB > Total time taken by all kfree'ers: 9953902918 ns, loops: 10000, batches: 1638, memory footprint: 89MB > Total time taken by all kfree'ers: 10176669464 ns, loops: 10000, batches: 1613, memory footprint: 84MB > Total time taken by all kfree'ers: 9387490978 ns, loops: 10000, batches: 1762, memory footprint: 85MB > Total time taken by all kfree'ers: 9530535341 ns, loops: 10000, batches: 1781, memory footprint: 66MB > Total time taken by all kfree'ers: 9945442340 ns, loops: 10000, batches: 1758, memory footprint: 68MB > Total time taken by all kfree'ers: 9188848287 ns, loops: 10000, batches: 1781, memory footprint: 67MB > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > kernel/rcu/tree.c | 28 ++++++++++++++++++++++++---- > 1 file changed, 24 insertions(+), 4 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index fd16c0b46d9e..c11670ba008e 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3249,7 +3249,7 @@ EXPORT_SYMBOL_GPL(call_rcu); > > > /* Maximum number of jiffies to wait before draining a batch. */ > -#define KFREE_DRAIN_JIFFIES (HZ / 50) > +#define KFREE_DRAIN_JIFFIES (HZ) > #define KFREE_N_BATCHES 2 > #define FREE_N_CHANNELS 2 > > @@ -3510,6 +3510,26 @@ need_offload_krc(struct kfree_rcu_cpu *krcp) > return !!krcp->head; > } > > +static void > +schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) > +{ > + long delay, delay_left; > + > + delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ? > + 1:KFREE_DRAIN_JIFFIES; The above fits as one line in 100 characters, like this: delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ? 1 : KFREE_DRAIN_JIFFIES; Many of the blank lines below this point in this function are better removed. Other than that, looks good! Could you please resend with the added trace with the patch applied? Thanx, Paul > + > + if (delayed_work_pending(&krcp->monitor_work)) { > + delay_left = krcp->monitor_work.timer.expires - jiffies; > + > + if (delay < delay_left) > + mod_delayed_work(system_wq, &krcp->monitor_work, delay); > + > + return; > + } > + > + queue_delayed_work(system_wq, &krcp->monitor_work, delay); > +} > + > /* > * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. > */ > @@ -3567,7 +3587,7 @@ static void kfree_rcu_monitor(struct work_struct *work) > // work to repeat an attempt. Because previous batches are > // still in progress. > if (need_offload_krc(krcp)) > - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); > + schedule_delayed_monitor_work(krcp); > > raw_spin_unlock_irqrestore(&krcp->lock, flags); > } > @@ -3755,7 +3775,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > > // Set timer to drain after KFREE_DRAIN_JIFFIES. > if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) > - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); > + schedule_delayed_monitor_work(krcp); > > unlock_return: > krc_this_cpu_unlock(krcp, flags); > @@ -3831,7 +3851,7 @@ void __init kfree_rcu_scheduler_running(void) > > raw_spin_lock_irqsave(&krcp->lock, flags); > if (need_offload_krc(krcp)) > - schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES); > + schedule_delayed_monitor_work(krcp); > raw_spin_unlock_irqrestore(&krcp->lock, flags); > } > } > -- > 2.30.2 >
> On Mon, Jun 27, 2022 at 09:53:53PM +0200, Uladzislau Rezki (Sony) wrote: > > Currently the monitor work is scheduled with a fixed interval that > > is HZ/20 or each 50 milliseconds. The drawback of such approach is > > a low utilization of page slot in some scenarios. The page can store > > up to 512 records. For example on Android system it can look like: > > I was looking at queuing this one, but we need a bit more data. In > the meantime, here is my wordsmithing of the above paragraph: > > Currently the monitor work is scheduled with a fixed interval of HZ/20, > which is roughly 50 milliseconds. The drawback of this approach is > low utilization of the 512 page slots in scenarios with infrequence > kvfree_rcu() calls. For example on an Android system: > Good i will update with your changes :) > > <snip> > > kworker/3:0-13872 [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1 > > kworker/3:0-13872 [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > > kworker/1:2-20434 [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > > kworker/1:2-20434 [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2 > > kworker/0:1-20052 [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2 > > kworker/0:1-20052 [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1 > > kworker/2:3-14372 [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > > kworker/2:3-14372 [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > > kworker/1:2-20434 [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5 > > kworker/2:3-14372 [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5 > > kworker/1:2-20434 [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1 > > kworker/2:3-14372 [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6 > > kworker/3:0-13872 [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3 > > kworker/1:2-20434 [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4 > > kworker/1:2-20434 [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4 > > <snip> > > > > where a page only carries few records to reclaim a memory. In order > > to improve batching and make utilization more efficient the patch sets > > a drain interval to 1 second as default one. When a flood is detected > > an interval is adjusted in a way that a reclaim work is re-scheduled > > on a next timer jiffy. > > And of the above paragraph: > > Out of 512 slots, in all cases, fewer than 10 were actually used. > In order to improve batching and make utilization more efficient this > commit sets a drain interval to a fixed 1-second interval. Floods are > detected when a page fills quickly, and in that case, the reclaim work > is re-scheduled for the next scheduling-clock tick (jiffy). > Same here, will apply this. > --- > > But what we need now is a trace like the one above showing higher utilization > of the pages. Could you please supply this? > Yep, i will add traces to show that utilization becomes better what is actually expectable. -- Uladzislau Rezki
On Tue, Jun 28, 2022 at 01:45:31PM +0200, Uladzislau Rezki wrote: > > On Mon, Jun 27, 2022 at 09:53:53PM +0200, Uladzislau Rezki (Sony) wrote: > > > Currently the monitor work is scheduled with a fixed interval that > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is > > > a low utilization of page slot in some scenarios. The page can store > > > up to 512 records. For example on Android system it can look like: > > > > I was looking at queuing this one, but we need a bit more data. In > > the meantime, here is my wordsmithing of the above paragraph: > > > > Currently the monitor work is scheduled with a fixed interval of HZ/20, > > which is roughly 50 milliseconds. The drawback of this approach is > > low utilization of the 512 page slots in scenarios with infrequence > > kvfree_rcu() calls. For example on an Android system: > > > Good i will update with your changes :) > > > > <snip> > > > kworker/3:0-13872 [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1 > > > kworker/3:0-13872 [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > > > kworker/1:2-20434 [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > > > kworker/1:2-20434 [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2 > > > kworker/0:1-20052 [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2 > > > kworker/0:1-20052 [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1 > > > kworker/2:3-14372 [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > > > kworker/2:3-14372 [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > > > kworker/1:2-20434 [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5 > > > kworker/2:3-14372 [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5 > > > kworker/1:2-20434 [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1 > > > kworker/2:3-14372 [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6 > > > kworker/3:0-13872 [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3 > > > kworker/1:2-20434 [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4 > > > kworker/1:2-20434 [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4 > > > <snip> > > > > > > where a page only carries few records to reclaim a memory. In order > > > to improve batching and make utilization more efficient the patch sets > > > a drain interval to 1 second as default one. When a flood is detected > > > an interval is adjusted in a way that a reclaim work is re-scheduled > > > on a next timer jiffy. > > > > And of the above paragraph: > > > > Out of 512 slots, in all cases, fewer than 10 were actually used. > > In order to improve batching and make utilization more efficient this > > commit sets a drain interval to a fixed 1-second interval. Floods are > > detected when a page fills quickly, and in that case, the reclaim work > > is re-scheduled for the next scheduling-clock tick (jiffy). > > > Same here, will apply this. > > > --- > > > > But what we need now is a trace like the one above showing higher utilization > > of the pages. Could you please supply this? > > > Yep, i will add traces to show that utilization becomes better what is > actually expectable. > > -- > Uladzislau Rezki Paul, see below the v3: From ff2a402a59442ba575f13f660ee1b140d44483b4 Mon Sep 17 00:00:00 2001 From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Date: Wed, 1 Jun 2022 11:28:53 +0200 Subject: [PATCH v3 1/1] rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval Currently the monitor work is scheduled with a fixed interval of HZ/20, which is roughly 50 milliseconds. The drawback of this approach is low utilization of the 512 page slots in scenarios with infrequence kvfree_rcu() calls. For example on an Android system: <snip> kworker/3:3-507 [003] .... 470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6 kworker/6:1-76 [006] .... 470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1 kworker/6:1-76 [006] .... 470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9 kworker/1:1-73 [001] .... 470.438895: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000091a38dc1 nr_records=1 kworker/3:3-507 [003] .... 471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48 kworker/1:1-73 [001] .... 471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3 kworker/1:1-73 [001] .... 471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76 kworker/3:3-507 [003] .... 471.958960: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000dce5bdbd nr_records=78 kworker/0:4-1411 [000] .... 472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1 kworker/0:4-1411 [000] .... 472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5 kworker/6:1-76 [006] .... 472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102 kworker/2:1-78 [002] .... 472.527362: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=1 <snip> Out of 512 slots, in all cases, fewer than 10 were actually used. In order to improve batching and make utilization more efficient this commit sets a drain interval to a fixed 1-second interval. Floods are detected when a page fills quickly, and in that case, the reclaim work is re-scheduled for the next scheduling-clock tick (jiffy). After this change: <snip> kworker/0:5-1415 [000] .... 239.852727: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005d2561ea nr_records=178 kworker/3:0-38 [003] .... 239.855113: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000039c44d27 nr_records=93 kworker/1:4-1747 [001] .... 239.872647: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000004abb2f7e nr_records=109 kworker/4:59-288 [004] .... 239.884829: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000049e7a308 nr_records=182 kworker/0:5-1415 [000] .... 240.028762: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000037eae6d6 nr_records=86 kworker/3:0-38 [003] .... 240.040769: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003be0ae55 nr_records=244 kworker/3:0-38 [003] .... 240.068639: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000021905efe nr_records=45 <...>-1415 [000] .... 240.868830: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007c97e065 nr_records=461 <...>-1415 [000] .... 241.068646: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000725886e nr_records=25 kworker/0:5-1415 [000] .... 241.100931: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000021905efe nr_records=300 worker/4:59-2880 [004] .... 241.801181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000004abb2f7e nr_records=180 worker/6:60-2554 [006] .... 242.000565: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000006d3084af nr_records=40 <snip> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- kernel/rcu/tree.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index fd16c0b46d9e..37f89bedf45e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3249,7 +3249,7 @@ EXPORT_SYMBOL_GPL(call_rcu); /* Maximum number of jiffies to wait before draining a batch. */ -#define KFREE_DRAIN_JIFFIES (HZ / 50) +#define KFREE_DRAIN_JIFFIES (HZ) #define KFREE_N_BATCHES 2 #define FREE_N_CHANNELS 2 @@ -3510,6 +3510,25 @@ need_offload_krc(struct kfree_rcu_cpu *krcp) return !!krcp->head; } +static void +schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) +{ + long delay, delay_left; + + delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES; + + if (delayed_work_pending(&krcp->monitor_work)) { + delay_left = krcp->monitor_work.timer.expires - jiffies; + + if (delay < delay_left) + mod_delayed_work(system_wq, &krcp->monitor_work, delay); + + return; + } + + queue_delayed_work(system_wq, &krcp->monitor_work, delay); +} + /* * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. */ @@ -3567,7 +3586,7 @@ static void kfree_rcu_monitor(struct work_struct *work) // work to repeat an attempt. Because previous batches are // still in progress. if (need_offload_krc(krcp)) - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); + schedule_delayed_monitor_work(krcp); raw_spin_unlock_irqrestore(&krcp->lock, flags); } @@ -3755,7 +3774,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) // Set timer to drain after KFREE_DRAIN_JIFFIES. if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); + schedule_delayed_monitor_work(krcp); unlock_return: krc_this_cpu_unlock(krcp, flags); @@ -3831,7 +3850,7 @@ void __init kfree_rcu_scheduler_running(void) raw_spin_lock_irqsave(&krcp->lock, flags); if (need_offload_krc(krcp)) - schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES); + schedule_delayed_monitor_work(krcp); raw_spin_unlock_irqrestore(&krcp->lock, flags); } }
On Tue, Jun 28, 2022 at 02:02:18PM +0200, Uladzislau Rezki wrote: > On Tue, Jun 28, 2022 at 01:45:31PM +0200, Uladzislau Rezki wrote: > > > On Mon, Jun 27, 2022 at 09:53:53PM +0200, Uladzislau Rezki (Sony) wrote: > > > > Currently the monitor work is scheduled with a fixed interval that > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is > > > > a low utilization of page slot in some scenarios. The page can store > > > > up to 512 records. For example on Android system it can look like: > > > > > > I was looking at queuing this one, but we need a bit more data. In > > > the meantime, here is my wordsmithing of the above paragraph: > > > > > > Currently the monitor work is scheduled with a fixed interval of HZ/20, > > > which is roughly 50 milliseconds. The drawback of this approach is > > > low utilization of the 512 page slots in scenarios with infrequence > > > kvfree_rcu() calls. For example on an Android system: > > > > > Good i will update with your changes :) > > > > > > <snip> > > > > kworker/3:0-13872 [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1 > > > > kworker/3:0-13872 [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > > > > kworker/1:2-20434 [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > > > > kworker/1:2-20434 [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2 > > > > kworker/0:1-20052 [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2 > > > > kworker/0:1-20052 [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1 > > > > kworker/2:3-14372 [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 > > > > kworker/2:3-14372 [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 > > > > kworker/1:2-20434 [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5 > > > > kworker/2:3-14372 [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5 > > > > kworker/1:2-20434 [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1 > > > > kworker/2:3-14372 [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6 > > > > kworker/3:0-13872 [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3 > > > > kworker/1:2-20434 [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4 > > > > kworker/1:2-20434 [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4 > > > > <snip> > > > > > > > > where a page only carries few records to reclaim a memory. In order > > > > to improve batching and make utilization more efficient the patch sets > > > > a drain interval to 1 second as default one. When a flood is detected > > > > an interval is adjusted in a way that a reclaim work is re-scheduled > > > > on a next timer jiffy. > > > > > > And of the above paragraph: > > > > > > Out of 512 slots, in all cases, fewer than 10 were actually used. > > > In order to improve batching and make utilization more efficient this > > > commit sets a drain interval to a fixed 1-second interval. Floods are > > > detected when a page fills quickly, and in that case, the reclaim work > > > is re-scheduled for the next scheduling-clock tick (jiffy). > > > > > Same here, will apply this. > > > > > --- > > > > > > But what we need now is a trace like the one above showing higher utilization > > > of the pages. Could you please supply this? > > > > > Yep, i will add traces to show that utilization becomes better what is > > actually expectable. > > > > -- > > Uladzislau Rezki > Paul, see below the v3: > > >From ff2a402a59442ba575f13f660ee1b140d44483b4 Mon Sep 17 00:00:00 2001 > From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> > Date: Wed, 1 Jun 2022 11:28:53 +0200 > Subject: [PATCH v3 1/1] rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval > > Currently the monitor work is scheduled with a fixed interval of HZ/20, > which is roughly 50 milliseconds. The drawback of this approach is > low utilization of the 512 page slots in scenarios with infrequence > kvfree_rcu() calls. For example on an Android system: > > <snip> > kworker/3:3-507 [003] .... 470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6 > kworker/6:1-76 [006] .... 470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1 > kworker/6:1-76 [006] .... 470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9 > kworker/1:1-73 [001] .... 470.438895: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000091a38dc1 nr_records=1 > kworker/3:3-507 [003] .... 471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48 > kworker/1:1-73 [001] .... 471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3 > kworker/1:1-73 [001] .... 471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76 > kworker/3:3-507 [003] .... 471.958960: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000dce5bdbd nr_records=78 > kworker/0:4-1411 [000] .... 472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1 > kworker/0:4-1411 [000] .... 472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5 > kworker/6:1-76 [006] .... 472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102 > kworker/2:1-78 [002] .... 472.527362: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=1 > <snip> > > Out of 512 slots, in all cases, fewer than 10 were actually used. Though with the re-run, this now needs to be something like "In many cases, out of 512 slots, fewer than 10 were actually used." > In order to improve batching and make utilization more efficient this > commit sets a drain interval to a fixed 1-second interval. Floods are > detected when a page fills quickly, and in that case, the reclaim work > is re-scheduled for the next scheduling-clock tick (jiffy). > > After this change: > > <snip> > kworker/0:5-1415 [000] .... 239.852727: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005d2561ea nr_records=178 > kworker/3:0-38 [003] .... 239.855113: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000039c44d27 nr_records=93 > kworker/1:4-1747 [001] .... 239.872647: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000004abb2f7e nr_records=109 > kworker/4:59-288 [004] .... 239.884829: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000049e7a308 nr_records=182 > kworker/0:5-1415 [000] .... 240.028762: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000037eae6d6 nr_records=86 > kworker/3:0-38 [003] .... 240.040769: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003be0ae55 nr_records=244 > kworker/3:0-38 [003] .... 240.068639: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000021905efe nr_records=45 > <...>-1415 [000] .... 240.868830: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007c97e065 nr_records=461 > <...>-1415 [000] .... 241.068646: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000725886e nr_records=25 > kworker/0:5-1415 [000] .... 241.100931: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000021905efe nr_records=300 > worker/4:59-2880 [004] .... 241.801181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000004abb2f7e nr_records=180 > worker/6:60-2554 [006] .... 242.000565: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000006d3084af nr_records=40 > <snip> Much better! Looking forward to the updated patch. Thanx, Paul > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > kernel/rcu/tree.c | 27 +++++++++++++++++++++++---- > 1 file changed, 23 insertions(+), 4 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index fd16c0b46d9e..37f89bedf45e 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3249,7 +3249,7 @@ EXPORT_SYMBOL_GPL(call_rcu); > > > /* Maximum number of jiffies to wait before draining a batch. */ > -#define KFREE_DRAIN_JIFFIES (HZ / 50) > +#define KFREE_DRAIN_JIFFIES (HZ) > #define KFREE_N_BATCHES 2 > #define FREE_N_CHANNELS 2 > > @@ -3510,6 +3510,25 @@ need_offload_krc(struct kfree_rcu_cpu *krcp) > return !!krcp->head; > } > > +static void > +schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) > +{ > + long delay, delay_left; > + > + delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES; > + > + if (delayed_work_pending(&krcp->monitor_work)) { > + delay_left = krcp->monitor_work.timer.expires - jiffies; > + > + if (delay < delay_left) > + mod_delayed_work(system_wq, &krcp->monitor_work, delay); > + > + return; > + } > + > + queue_delayed_work(system_wq, &krcp->monitor_work, delay); > +} > + > /* > * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. > */ > @@ -3567,7 +3586,7 @@ static void kfree_rcu_monitor(struct work_struct *work) > // work to repeat an attempt. Because previous batches are > // still in progress. > if (need_offload_krc(krcp)) > - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); > + schedule_delayed_monitor_work(krcp); > > raw_spin_unlock_irqrestore(&krcp->lock, flags); > } > @@ -3755,7 +3774,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > > // Set timer to drain after KFREE_DRAIN_JIFFIES. > if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) > - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); > + schedule_delayed_monitor_work(krcp); > > unlock_return: > krc_this_cpu_unlock(krcp, flags); > @@ -3831,7 +3850,7 @@ void __init kfree_rcu_scheduler_running(void) > > raw_spin_lock_irqsave(&krcp->lock, flags); > if (need_offload_krc(krcp)) > - schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES); > + schedule_delayed_monitor_work(krcp); > raw_spin_unlock_irqrestore(&krcp->lock, flags); > } > } > -- > 2.30.2
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index fd16c0b46d9e..c11670ba008e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3249,7 +3249,7 @@ EXPORT_SYMBOL_GPL(call_rcu); /* Maximum number of jiffies to wait before draining a batch. */ -#define KFREE_DRAIN_JIFFIES (HZ / 50) +#define KFREE_DRAIN_JIFFIES (HZ) #define KFREE_N_BATCHES 2 #define FREE_N_CHANNELS 2 @@ -3510,6 +3510,26 @@ need_offload_krc(struct kfree_rcu_cpu *krcp) return !!krcp->head; } +static void +schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) +{ + long delay, delay_left; + + delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ? + 1:KFREE_DRAIN_JIFFIES; + + if (delayed_work_pending(&krcp->monitor_work)) { + delay_left = krcp->monitor_work.timer.expires - jiffies; + + if (delay < delay_left) + mod_delayed_work(system_wq, &krcp->monitor_work, delay); + + return; + } + + queue_delayed_work(system_wq, &krcp->monitor_work, delay); +} + /* * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. */ @@ -3567,7 +3587,7 @@ static void kfree_rcu_monitor(struct work_struct *work) // work to repeat an attempt. Because previous batches are // still in progress. if (need_offload_krc(krcp)) - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); + schedule_delayed_monitor_work(krcp); raw_spin_unlock_irqrestore(&krcp->lock, flags); } @@ -3755,7 +3775,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) // Set timer to drain after KFREE_DRAIN_JIFFIES. if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) - schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES); + schedule_delayed_monitor_work(krcp); unlock_return: krc_this_cpu_unlock(krcp, flags); @@ -3831,7 +3851,7 @@ void __init kfree_rcu_scheduler_running(void) raw_spin_lock_irqsave(&krcp->lock, flags); if (need_offload_krc(krcp)) - schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES); + schedule_delayed_monitor_work(krcp); raw_spin_unlock_irqrestore(&krcp->lock, flags); } }
Currently the monitor work is scheduled with a fixed interval that is HZ/20 or each 50 milliseconds. The drawback of such approach is a low utilization of page slot in some scenarios. The page can store up to 512 records. For example on Android system it can look like: <snip> kworker/3:0-13872 [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1 kworker/3:0-13872 [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 kworker/1:2-20434 [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 kworker/1:2-20434 [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2 kworker/0:1-20052 [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2 kworker/0:1-20052 [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1 kworker/2:3-14372 [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2 kworker/2:3-14372 [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1 kworker/1:2-20434 [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5 kworker/2:3-14372 [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5 kworker/1:2-20434 [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1 kworker/2:3-14372 [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6 kworker/3:0-13872 [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3 kworker/1:2-20434 [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4 kworker/1:2-20434 [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4 <snip> where a page only carries few records to reclaim a memory. In order to improve batching and make utilization more efficient the patch sets a drain interval to 1 second as default one. When a flood is detected an interval is adjusted in a way that a reclaim work is re-scheduled on a next timer jiffy. - default - Total time taken by all kfree'ers: 11510245312 ns, loops: 10000, batches: 1553, memory footprint: 70MB Total time taken by all kfree'ers: 9813329636 ns, loops: 10000, batches: 1544, memory footprint: 67MB Total time taken by all kfree'ers: 10085206318 ns, loops: 10000, batches: 1499, memory footprint: 156MB Total time taken by all kfree'ers: 9582207782 ns, loops: 10000, batches: 1456, memory footprint: 103MB Total time taken by all kfree'ers: 9872195750 ns, loops: 10000, batches: 1519, memory footprint: 105MB Total time taken by all kfree'ers: 9574438300 ns, loops: 10000, batches: 1480, memory footprint: 101MB Total time taken by all kfree'ers: 9990015265 ns, loops: 10000, batches: 1489, memory footprint: 127MB Total time taken by all kfree'ers: 9978971689 ns, loops: 10000, batches: 1455, memory footprint: 94MB Total time taken by all kfree'ers: 10357628090 ns, loops: 10000, batches: 1456, memory footprint: 64MB Total time taken by all kfree'ers: 9838469975 ns, loops: 10000, batches: 1448, memory footprint: 131MB - patch - Total time taken by all kfree'ers: 8488575321 ns, loops: 10000, batches: 1735, memory footprint: 82MB Total time taken by all kfree'ers: 9256401034 ns, loops: 10000, batches: 1762, memory footprint: 76MB Total time taken by all kfree'ers: 9198011994 ns, loops: 10000, batches: 1751, memory footprint: 82MB Total time taken by all kfree'ers: 10590540622 ns, loops: 10000, batches: 1540, memory footprint: 94MB Total time taken by all kfree'ers: 9953902918 ns, loops: 10000, batches: 1638, memory footprint: 89MB Total time taken by all kfree'ers: 10176669464 ns, loops: 10000, batches: 1613, memory footprint: 84MB Total time taken by all kfree'ers: 9387490978 ns, loops: 10000, batches: 1762, memory footprint: 85MB Total time taken by all kfree'ers: 9530535341 ns, loops: 10000, batches: 1781, memory footprint: 66MB Total time taken by all kfree'ers: 9945442340 ns, loops: 10000, batches: 1758, memory footprint: 68MB Total time taken by all kfree'ers: 9188848287 ns, loops: 10000, batches: 1781, memory footprint: 67MB Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- kernel/rcu/tree.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-)