Message ID | 20220817191524.201253713@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too | expand |
On Wed, 17 Aug 2022 16:13:48 -0300 Marcelo Tosatti <mtosatti@redhat.com> wrote: > From: Aaron Tomlin <atomlin@redhat.com> > > In the context of the idle task and an adaptive-tick mode/or a nohz_full > CPU, quiet_vmstat() can be called: before stopping the idle tick, > entering an idle state and on exit. In particular, for the latter case, > when the idle task is required to reschedule, the idle tick can remain > stopped and the timer expiration time endless i.e., KTIME_MAX. Now, > indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat > counters should be processed to ensure the respective values have been > reset and folded into the zone specific 'vm_stat[]'. That being said, it > can only occur when: the idle tick was previously stopped, and > reprogramming of the timer is not required. I'd like to see input from tick/sched maintainers before toughing this one, please. > --- linux-2.6.orig/kernel/time/tick-sched.c > +++ linux-2.6/kernel/time/tick-sched.c > @@ -26,6 +26,7 @@ > #include <linux/posix-timers.h> > #include <linux/context_tracking.h> > #include <linux/mm.h> > +#include <linux/rcupdate.h> > > #include <asm/irq_regs.h> > > @@ -519,6 +520,20 @@ void __tick_nohz_task_switch(void) > } > } > > +void __tick_nohz_user_enter_prepare(void) > +{ > + struct tick_sched *ts; > + > + if (tick_nohz_full_cpu(smp_processor_id())) { > + ts = this_cpu_ptr(&tick_cpu_sched); > + > + if (ts->tick_stopped) > + quiet_vmstat(); > + rcu_nocb_flush_deferred_wakeup(); > + } > +} > +EXPORT_SYMBOL_GPL(__tick_nohz_user_enter_prepare); > + > /* Get the boot-time nohz CPU list from the kernel parameters. */ > void __init tick_nohz_full_setup(cpumask_var_t cpumask) > { > @@ -890,6 +905,9 @@ static void tick_nohz_stop_tick(struct t > ts->do_timer_last = 0; > } > > + /* Attempt to fold when the idle tick is stopped or not */ > + quiet_vmstat(); > + > /* Skip reprogram of event if its not changed */ > if (ts->tick_stopped && (expires == ts->next_tick)) { > /* Sanity check: make sure clockevent is actually programmed */ > @@ -911,7 +929,6 @@ static void tick_nohz_stop_tick(struct t > */ > if (!ts->tick_stopped) { > calc_load_nohz_start(); > - quiet_vmstat(); > > ts->last_tick = hrtimer_get_expires(&ts->sched_timer); > ts->tick_stopped = 1; Putting vmstat stuff inside core timer code is unattractive, to say the least!
On Wed, Aug 17, 2022 at 04:13:48PM -0300, Marcelo Tosatti wrote: > From: Aaron Tomlin <atomlin@redhat.com> > > In the context of the idle task and an adaptive-tick mode/or a nohz_full > CPU, quiet_vmstat() can be called: before stopping the idle tick, > entering an idle state and on exit. In particular, for the latter case, > when the idle task is required to reschedule, the idle tick can remain > stopped Since quiet_vmstat() is only called when ts->tick_stopped = false, this can only happen if the idle loop did not enter into dynticks idle mode but the exiting idle task eventually stops the tick (tick_nohz_idle_update_tick()). This can happen for example if we enter the idle loop with a timer callback pending in one jiffies, then once that timer fires, which wakes up a task, we exit the idle loop and then tick_nohz_idle_update_tick() doesn't see any timer callback pending left and the tick can be stopped. Or am I missing something? > and the timer expiration time endless i.e., KTIME_MAX. Now, > indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat > counters should be processed to ensure the respective values have been > reset and folded into the zone specific 'vm_stat[]'. That being said, it > can only occur when: the idle tick was previously stopped, and > reprogramming of the timer is not required. > > A customer provided some evidence which indicates that the idle tick was > stopped; albeit, CPU-specific vmstat counters still remained populated. > Thus one can only assume quiet_vmstat() was not invoked on return to the > idle loop. > > If I understand correctly, I suspect this divergence might erroneously > prevent a reclaim attempt by kswapd. If the number of zone specific free > pages are below their per-cpu drift value then > zone_page_state_snapshot() is used to compute a more accurate view of > the aforementioned statistic. Thus any task blocked on the NUMA node > specific pfmemalloc_wait queue will be unable to make significant > progress via direct reclaim unless it is killed after being woken up by > kswapd (see throttle_direct_reclaim()). > > Consider the following theoretical scenario: > > 1. CPU Y migrated running task A to CPU X that was > in an idle state i.e. waiting for an IRQ - not > polling; marked the current task on CPU X to > need/or require a reschedule i.e., set > TIF_NEED_RESCHED and invoked a reschedule IPI to > CPU X (see sched_move_task()) CPU Y is nohz_full right? > > 2. CPU X acknowledged the reschedule IPI from CPU Y; > generic idle loop code noticed the > TIF_NEED_RESCHED flag against the idle task and > attempts to exit of the loop and calls the main > scheduler function i.e. __schedule(). > > Since the idle tick was previously stopped no > scheduling-clock tick would occur. > So, no deferred timers would be handled > > 3. Post transition to kernel execution Task A > running on CPU Y, indirectly released a few pages > (e.g. see __free_one_page()); CPU Y's > 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone > specific 'vm_stat[]' update was deferred as per the > CPU-specific stat threshold > > 4. Task A does invoke exit(2) and the kernel does > remove the task from the run-queue; the idle task > was selected to execute next since there are no > other runnable tasks assigned to the given CPU > (see pick_next_task() and pick_next_task_idle()) This happens on CPU X, right? > > 5. On return to the idle loop since the idle tick > was already stopped and can remain so (see [1] > below) e.g. no pending soft IRQs, no attempt is > made to zero and fold CPU Y's vmstat counters > since reprogramming of the scheduling-clock tick > is not required/or needed (see [2]) And now back to CPU Y, confused... [...] > Index: linux-2.6/kernel/time/tick-sched.c > =================================================================== > --- linux-2.6.orig/kernel/time/tick-sched.c > +++ linux-2.6/kernel/time/tick-sched.c > @@ -26,6 +26,7 @@ > #include <linux/posix-timers.h> > #include <linux/context_tracking.h> > #include <linux/mm.h> > +#include <linux/rcupdate.h> > > #include <asm/irq_regs.h> > > @@ -519,6 +520,20 @@ void __tick_nohz_task_switch(void) > } > } > > +void __tick_nohz_user_enter_prepare(void) > +{ > + struct tick_sched *ts; > + > + if (tick_nohz_full_cpu(smp_processor_id())) { > + ts = this_cpu_ptr(&tick_cpu_sched); > + > + if (ts->tick_stopped) > + quiet_vmstat(); Wasn't it supposed to be part of the quiescing in task isolation mode? Because currently vmstat is a deferrable timer but that deferrability may not apply to nohz_full anymore (outside idle). And quiet_vmstat() doesn't cancel the timer so you'll still get the disturbance. See this patch: https://lore.kernel.org/lkml/20220725104356.GA2950296@lothringen/ > + rcu_nocb_flush_deferred_wakeup(); > + } > +} > > +EXPORT_SYMBOL_GPL(__tick_nohz_user_enter_prepare); > + > /* Get the boot-time nohz CPU list from the kernel parameters. */ > void __init tick_nohz_full_setup(cpumask_var_t cpumask) > { > @@ -890,6 +905,9 @@ static void tick_nohz_stop_tick(struct t > ts->do_timer_last = 0; > } > > + /* Attempt to fold when the idle tick is stopped or not */ > + quiet_vmstat(); > + > /* Skip reprogram of event if its not changed */ > if (ts->tick_stopped && (expires == ts->next_tick)) { > /* Sanity check: make sure clockevent is actually programmed */ But that chunk looks good. Thanks. > @@ -911,7 +929,6 @@ static void tick_nohz_stop_tick(struct t > */ > if (!ts->tick_stopped) { > calc_load_nohz_start(); > - quiet_vmstat(); > > ts->last_tick = hrtimer_get_expires(&ts->sched_timer); > ts->tick_stopped = 1; > >
On Fri, Sep 09, 2022 at 02:12:24PM +0200, Frederic Weisbecker wrote: > On Wed, Aug 17, 2022 at 04:13:48PM -0300, Marcelo Tosatti wrote: > > From: Aaron Tomlin <atomlin@redhat.com> > > > > In the context of the idle task and an adaptive-tick mode/or a nohz_full > > CPU, quiet_vmstat() can be called: before stopping the idle tick, > > entering an idle state and on exit. In particular, for the latter case, > > when the idle task is required to reschedule, the idle tick can remain > > stopped > > Since quiet_vmstat() is only called when ts->tick_stopped = false, this > can only happen if the idle loop did not enter into dynticks idle mode > but the exiting idle task eventually stops the tick > (tick_nohz_idle_update_tick()). > > This can happen for example if we enter the idle loop with a timer callback > pending in one jiffies, then once that timer fires, which wakes up a task, > we exit the idle loop and then tick_nohz_idle_update_tick() doesn't see any > timer callback pending left and the tick can be stopped. > > Or am I missing something? For the scenario where we re-enter idle without calling quiet_vmstat: CPU-0 CPU-1 0) vmstat_shepherd notices its necessary to queue vmstat work to remote CPU, queues deferrable timer into timer wheel, and calls trigger_dyntick_cpu (target_cpu == cpu-1). 1) Stop the tick (get_next_timer_interrupt will not take deferrable timers into account), calls quiet_vmstat, which keeps the vmstat work (vmstat_update function) queued. 2) Idle 3) Idle exit 4) Run thread on CPU, some activity marks vmstat dirty 5) Idle 6) Goto 3 At 5, since the tick is already stopped, the deferrable timer for the delayed work item will not execute, and vmstat_shepherd will consider static void vmstat_shepherd(struct work_struct *w) { int cpu; cpus_read_lock(); /* Check processors whose vmstat worker threads have been disabled */ for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); if (!delayed_work_pending(dw) && need_update(cpu)) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); cond_resched(); } cpus_read_unlock(); schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); } As far as i can tell... > > and the timer expiration time endless i.e., KTIME_MAX. Now, > > indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat > > counters should be processed to ensure the respective values have been > > reset and folded into the zone specific 'vm_stat[]'. That being said, it > > can only occur when: the idle tick was previously stopped, and > > reprogramming of the timer is not required. > > > > A customer provided some evidence which indicates that the idle tick was > > stopped; albeit, CPU-specific vmstat counters still remained populated. > > Thus one can only assume quiet_vmstat() was not invoked on return to the > > idle loop. > > > > If I understand correctly, I suspect this divergence might erroneously > > prevent a reclaim attempt by kswapd. If the number of zone specific free > > pages are below their per-cpu drift value then > > zone_page_state_snapshot() is used to compute a more accurate view of > > the aforementioned statistic. Thus any task blocked on the NUMA node > > specific pfmemalloc_wait queue will be unable to make significant > > progress via direct reclaim unless it is killed after being woken up by > > kswapd (see throttle_direct_reclaim()). > > > > Consider the following theoretical scenario: > > > > 1. CPU Y migrated running task A to CPU X that was > > in an idle state i.e. waiting for an IRQ - not > > polling; marked the current task on CPU X to > > need/or require a reschedule i.e., set > > TIF_NEED_RESCHED and invoked a reschedule IPI to > > CPU X (see sched_move_task()) > > CPU Y is nohz_full right? > > > > > 2. CPU X acknowledged the reschedule IPI from CPU Y; > > generic idle loop code noticed the > > TIF_NEED_RESCHED flag against the idle task and > > attempts to exit of the loop and calls the main > > scheduler function i.e. __schedule(). > > > > Since the idle tick was previously stopped no > > scheduling-clock tick would occur. > > So, no deferred timers would be handled > > > > 3. Post transition to kernel execution Task A > > running on CPU Y, indirectly released a few pages > > (e.g. see __free_one_page()); CPU Y's > > 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone > > specific 'vm_stat[]' update was deferred as per the > > CPU-specific stat threshold > > > > 4. Task A does invoke exit(2) and the kernel does > > remove the task from the run-queue; the idle task > > was selected to execute next since there are no > > other runnable tasks assigned to the given CPU > > (see pick_next_task() and pick_next_task_idle()) > > This happens on CPU X, right? > > > > > 5. On return to the idle loop since the idle tick > > was already stopped and can remain so (see [1] > > below) e.g. no pending soft IRQs, no attempt is > > made to zero and fold CPU Y's vmstat counters > > since reprogramming of the scheduling-clock tick > > is not required/or needed (see [2]) > > And now back to CPU Y, confused... Aaron, can you explain the diagram above? AFAIU the problem can also be understood with the simpler explanation above. > [...] > > Index: linux-2.6/kernel/time/tick-sched.c > > =================================================================== > > --- linux-2.6.orig/kernel/time/tick-sched.c > > +++ linux-2.6/kernel/time/tick-sched.c > > @@ -26,6 +26,7 @@ > > #include <linux/posix-timers.h> > > #include <linux/context_tracking.h> > > #include <linux/mm.h> > > +#include <linux/rcupdate.h> > > > > #include <asm/irq_regs.h> > > > > @@ -519,6 +520,20 @@ void __tick_nohz_task_switch(void) > > } > > } > > > > +void __tick_nohz_user_enter_prepare(void) > > +{ > > + struct tick_sched *ts; > > + > > + if (tick_nohz_full_cpu(smp_processor_id())) { > > + ts = this_cpu_ptr(&tick_cpu_sched); > > + > > + if (ts->tick_stopped) > > + quiet_vmstat(); > > Wasn't it supposed to be part of the quiescing in task isolation > mode? Not requiring application changes seems useful (so if we can drop the need for task isolation ioctls from userspace, that is better). > Because currently vmstat is a deferrable timer but that deferrability > may not apply to nohz_full anymore (outside idle). And quiet_vmstat() > doesn't cancel the timer so you'll still get the disturbance. > > See this patch: https://lore.kernel.org/lkml/20220725104356.GA2950296@lothringen/ Right. But i think the nohz_full applications prefer not to be interrupted with the vmstat_work in the first place. > > + rcu_nocb_flush_deferred_wakeup(); > > + } > > +} > > > > +EXPORT_SYMBOL_GPL(__tick_nohz_user_enter_prepare); > > + > > /* Get the boot-time nohz CPU list from the kernel parameters. */ > > void __init tick_nohz_full_setup(cpumask_var_t cpumask) > > { > > @@ -890,6 +905,9 @@ static void tick_nohz_stop_tick(struct t > > ts->do_timer_last = 0; > > } > > > > + /* Attempt to fold when the idle tick is stopped or not */ > > + quiet_vmstat(); > > + > > /* Skip reprogram of event if its not changed */ > > if (ts->tick_stopped && (expires == ts->next_tick)) { > > /* Sanity check: make sure clockevent is actually programmed */ > > But that chunk looks good. > > Thanks. Do you see any issue with syncing the vmstat on return to userspace as well, if nohz_full and tick is disabled? Note the syscall entry/exit numbers, the overhead is minimal. > > @@ -911,7 +929,6 @@ static void tick_nohz_stop_tick(struct t > > */ > > if (!ts->tick_stopped) { > > calc_load_nohz_start(); > > - quiet_vmstat(); > > > > ts->last_tick = hrtimer_get_expires(&ts->sched_timer); > > ts->tick_stopped = 1; > > > > > >
On Fri 2022-09-09 16:35 -0300, Marcelo Tosatti wrote: > For the scenario where we re-enter idle without calling quiet_vmstat: > > > CPU-0 CPU-1 > > 0) vmstat_shepherd notices its necessary to queue vmstat work > to remote CPU, queues deferrable timer into timer wheel, and calls > trigger_dyntick_cpu (target_cpu == cpu-1). > > 1) Stop the tick (get_next_timer_interrupt will not take deferrable > timers into account), calls quiet_vmstat, which keeps the vmstat work > (vmstat_update function) queued. > 2) Idle > 3) Idle exit > 4) Run thread on CPU, some activity marks vmstat dirty > 5) Idle > 6) Goto 3 > > At 5, since the tick is already stopped, the deferrable > timer for the delayed work item will not execute, > and vmstat_shepherd will consider > > static void vmstat_shepherd(struct work_struct *w) > { > int cpu; > > cpus_read_lock(); > /* Check processors whose vmstat worker threads have been disabled */ > for_each_online_cpu(cpu) { > struct delayed_work *dw = &per_cpu(vmstat_work, cpu); > > if (!delayed_work_pending(dw) && need_update(cpu)) > queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); > > cond_resched(); > } > cpus_read_unlock(); > > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > } > > As far as i can tell... Hi Marcelo, Yes, I agree with the scenario above. > > > Consider the following theoretical scenario: > > > > > > 1. CPU Y migrated running task A to CPU X that was > > > in an idle state i.e. waiting for an IRQ - not > > > polling; marked the current task on CPU X to > > > need/or require a reschedule i.e., set > > > TIF_NEED_RESCHED and invoked a reschedule IPI to > > > CPU X (see sched_move_task()) > > > > CPU Y is nohz_full right? > > > > > > > > 2. CPU X acknowledged the reschedule IPI from CPU Y; > > > generic idle loop code noticed the > > > TIF_NEED_RESCHED flag against the idle task and > > > attempts to exit of the loop and calls the main > > > scheduler function i.e. __schedule(). > > > > > > Since the idle tick was previously stopped no > > > scheduling-clock tick would occur. > > > So, no deferred timers would be handled > > > > > > 3. Post transition to kernel execution Task A > > > running on CPU Y, indirectly released a few pages > > > (e.g. see __free_one_page()); CPU Y's > > > 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone > > > specific 'vm_stat[]' update was deferred as per the > > > CPU-specific stat threshold > > > > > > 4. Task A does invoke exit(2) and the kernel does > > > remove the task from the run-queue; the idle task > > > was selected to execute next since there are no > > > other runnable tasks assigned to the given CPU > > > (see pick_next_task() and pick_next_task_idle()) > > > > This happens on CPU X, right? > > > > > > > > 5. On return to the idle loop since the idle tick > > > was already stopped and can remain so (see [1] > > > below) e.g. no pending soft IRQs, no attempt is > > > made to zero and fold CPU Y's vmstat counters > > > since reprogramming of the scheduling-clock tick > > > is not required/or needed (see [2]) > > > > And now back to CPU Y, confused... > > Aaron, can you explain the diagram above? Hi Frederic, Sorry about that. How about the following: - Note: CPU X is part of 'tick_nohz_full_mask' 1. CPU Y migrated running task A to CPU X that was in an idle state i.e. waiting for an IRQ; marked the current task on CPU X to need/or require a reschedule i.e., set TIF_NEED_RESCHED and invoked a reschedule IPI to CPU X (see sched_move_task()) 2. CPU X acknowledged the reschedule IPI. Generic idle loop code noticed the TIF_NEED_RESCHED flag against the idle task and attempts to exit of the loop and calls the main scheduler function i.e. __schedule(). Since the idle tick was previously stopped no scheduling-clock tick would occur. So, no deferred timers would be handled 3. Post transition to kernel execution Task A running on CPU X, indirectly released a few pages (e.g. see __free_one_page()); CPU X's 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone specific 'vm_stat[]' update was deferred as per the CPU-specific stat threshold 4. Task A does invoke exit(2) and the kernel does remove the task from the run-queue; the idle task was selected to execute next since there are no other runnable tasks assigned to the given CPU (see pick_next_task() and pick_next_task_idle()) 5. On return to the idle loop since the idle tick was already stopped and can remain so (see [1] below) e.g. no pending soft IRQs, no attempt is made to zero and fold CPU X's vmstat counters since reprogramming of the scheduling-clock tick is not required/or needed (see [2]) Kind regards,
On Mon, Sep 12, 2022 at 03:38:23PM +0100, Aaron Tomlin wrote: > On Fri 2022-09-09 16:35 -0300, Marcelo Tosatti wrote: > Hi Frederic, > > Sorry about that. How about the following: > > - Note: CPU X is part of 'tick_nohz_full_mask' > > 1. CPU Y migrated running task A to CPU X that > was in an idle state i.e. waiting for an IRQ; > marked the current task on CPU X to need/or > require a reschedule i.e., set TIF_NEED_RESCHED > and invoked a reschedule IPI to CPU X > (see sched_move_task()) > > 2. CPU X acknowledged the reschedule IPI. Generic > idle loop code noticed the TIF_NEED_RESCHED flag > against the idle task and attempts to exit of the > loop and calls the main scheduler function i.e. > __schedule(). > > Since the idle tick was previously stopped no > scheduling-clock tick would occur. > So, no deferred timers would be handled > > 3. Post transition to kernel execution Task A > running on CPU X, indirectly released a few pages > (e.g. see __free_one_page()); CPU X's > 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone > specific 'vm_stat[]' update was deferred as per the > CPU-specific stat threshold > > 4. Task A does invoke exit(2) and the kernel does > remove the task from the run-queue; the idle task > was selected to execute next since there are no > other runnable tasks assigned to the given CPU > (see pick_next_task() and pick_next_task_idle()) > > 5. On return to the idle loop since the idle tick > was already stopped and can remain so (see [1] > below) e.g. no pending soft IRQs, no attempt is > made to zero and fold CPU X's vmstat counters > since reprogramming of the scheduling-clock tick > is not required/or needed (see [2]) Much better thanks. Please cut the patch in two patches: one that fixes the stuff in the idle path and another one that fixes the return to user path. The first one is definetly a fix, the second one is rather a feature that is definetly wanted as well but I need to think it through further. > > > > Kind regards, > > -- > Aaron Tomlin >
Index: linux-2.6/include/linux/tick.h =================================================================== --- linux-2.6.orig/include/linux/tick.h +++ linux-2.6/include/linux/tick.h @@ -11,7 +11,6 @@ #include <linux/context_tracking_state.h> #include <linux/cpumask.h> #include <linux/sched.h> -#include <linux/rcupdate.h> #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void __init tick_init(void); @@ -272,6 +271,7 @@ static inline void tick_dep_clear_signal extern void tick_nohz_full_kick_cpu(int cpu); extern void __tick_nohz_task_switch(void); +void __tick_nohz_user_enter_prepare(void); extern void __init tick_nohz_full_setup(cpumask_var_t cpumask); #else static inline bool tick_nohz_full_enabled(void) { return false; } @@ -296,6 +296,7 @@ static inline void tick_dep_clear_signal static inline void tick_nohz_full_kick_cpu(int cpu) { } static inline void __tick_nohz_task_switch(void) { } +static inline void __tick_nohz_user_enter_prepare(void) { } static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { } #endif @@ -308,7 +309,7 @@ static inline void tick_nohz_task_switch static inline void tick_nohz_user_enter_prepare(void) { if (tick_nohz_full_cpu(smp_processor_id())) - rcu_nocb_flush_deferred_wakeup(); + __tick_nohz_user_enter_prepare(); } #endif Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -26,6 +26,7 @@ #include <linux/posix-timers.h> #include <linux/context_tracking.h> #include <linux/mm.h> +#include <linux/rcupdate.h> #include <asm/irq_regs.h> @@ -519,6 +520,20 @@ void __tick_nohz_task_switch(void) } } +void __tick_nohz_user_enter_prepare(void) +{ + struct tick_sched *ts; + + if (tick_nohz_full_cpu(smp_processor_id())) { + ts = this_cpu_ptr(&tick_cpu_sched); + + if (ts->tick_stopped) + quiet_vmstat(); + rcu_nocb_flush_deferred_wakeup(); + } +} +EXPORT_SYMBOL_GPL(__tick_nohz_user_enter_prepare); + /* Get the boot-time nohz CPU list from the kernel parameters. */ void __init tick_nohz_full_setup(cpumask_var_t cpumask) { @@ -890,6 +905,9 @@ static void tick_nohz_stop_tick(struct t ts->do_timer_last = 0; } + /* Attempt to fold when the idle tick is stopped or not */ + quiet_vmstat(); + /* Skip reprogram of event if its not changed */ if (ts->tick_stopped && (expires == ts->next_tick)) { /* Sanity check: make sure clockevent is actually programmed */ @@ -911,7 +929,6 @@ static void tick_nohz_stop_tick(struct t */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1;