Message ID | 20231107215742.363031-55-ankur.a.arora@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Make the kernel preemptible | expand |
On Tue, Nov 07 2023 at 13:57, Ankur Arora wrote: > The kernel has a lot of intances of cond_resched() where it is used > as an alternative to spinning in a tight-loop while waiting to > retry an operation, or while waiting for a device state to change. > > Unfortunately, because the scheduler is unlikely to have an > interminable supply of runnable tasks on the runqueue, this just > amounts to spinning in a tight-loop with a cond_resched(). > (When running in a fully preemptible kernel, cond_resched() > calls are stubbed out so it amounts to even less.) > > In sum, cond_resched() in error handling/retry contexts might > be useful in avoiding softlockup splats, but not very good at > error handling. Ideally, these should be replaced with some kind > of timed or event wait. > > For now add cond_resched_stall(), which tries to schedule if > possible, and failing that executes a cpu_relax(). What's the point of this new variant of cond_resched()? We really do not want it at all. > +int __cond_resched_stall(void) > +{ > + if (tif_need_resched(RESCHED_eager)) { > + __preempt_schedule(); Under the new model TIF_NEED_RESCHED is going to reschedule if the preemption counter goes to zero. So the typical while (readl(mmio) & BUSY) cpu_relax(); will just be preempted like any other loop, no? Confused.
Thomas Gleixner <tglx@linutronix.de> writes: > On Tue, Nov 07 2023 at 13:57, Ankur Arora wrote: >> The kernel has a lot of intances of cond_resched() where it is used >> as an alternative to spinning in a tight-loop while waiting to >> retry an operation, or while waiting for a device state to change. >> >> Unfortunately, because the scheduler is unlikely to have an >> interminable supply of runnable tasks on the runqueue, this just >> amounts to spinning in a tight-loop with a cond_resched(). >> (When running in a fully preemptible kernel, cond_resched() >> calls are stubbed out so it amounts to even less.) >> >> In sum, cond_resched() in error handling/retry contexts might >> be useful in avoiding softlockup splats, but not very good at >> error handling. Ideally, these should be replaced with some kind >> of timed or event wait. >> >> For now add cond_resched_stall(), which tries to schedule if >> possible, and failing that executes a cpu_relax(). > > What's the point of this new variant of cond_resched()? We really do not > want it at all. > >> +int __cond_resched_stall(void) >> +{ >> + if (tif_need_resched(RESCHED_eager)) { >> + __preempt_schedule(); > > Under the new model TIF_NEED_RESCHED is going to reschedule if the > preemption counter goes to zero. Yes agreed. cond_resched_stall() was just meant to be window dressing. > So the typical > > while (readl(mmio) & BUSY) > cpu_relax(); > > will just be preempted like any other loop, no? Yeah. But drivers could be using that right now as well. I suspect people don't like the idea of spinning in a loop and, that's why they use cond_resched(). Which in loops like this, is pretty much: while (readl(mmio) & BUSY) ; The reason I added cond_resched_stall() was as an analogue to cond_resched_lock() etc. Here, explicitly giving up CPU. Though, someone pointed out a much better interface to do that sort of thing: readb_poll_timeout(). Not all but a fair number of sites could be converted to that. Ankur
diff --git a/include/linux/sched.h b/include/linux/sched.h index 6ba4371761c4..199f8f7211f2 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2100,6 +2100,7 @@ static inline int _cond_resched(void) { return 0; } extern int __cond_resched_lock(spinlock_t *lock); extern int __cond_resched_rwlock_read(rwlock_t *lock); extern int __cond_resched_rwlock_write(rwlock_t *lock); +extern int __cond_resched_stall(void); #define MIGHT_RESCHED_RCU_SHIFT 8 #define MIGHT_RESCHED_PREEMPT_MASK ((1U << MIGHT_RESCHED_RCU_SHIFT) - 1) @@ -2135,6 +2136,11 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock); __cond_resched_rwlock_write(lock); \ }) +#define cond_resched_stall() ({ \ + __might_resched(__FILE__, __LINE__, 0); \ + __cond_resched_stall(); \ +}) + static inline void cond_resched_rcu(void) { #if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || !defined(CONFIG_PREEMPT_RCU) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e1b0759ed3ab..ea00e8489ebb 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8652,6 +8652,18 @@ int __cond_resched_rwlock_write(rwlock_t *lock) } EXPORT_SYMBOL(__cond_resched_rwlock_write); +int __cond_resched_stall(void) +{ + if (tif_need_resched(RESCHED_eager)) { + __preempt_schedule(); + return 1; + } else { + cpu_relax(); + return 0; + } +} +EXPORT_SYMBOL(__cond_resched_stall); + /** * yield - yield the current processor to other threads. *
The kernel has a lot of intances of cond_resched() where it is used as an alternative to spinning in a tight-loop while waiting to retry an operation, or while waiting for a device state to change. Unfortunately, because the scheduler is unlikely to have an interminable supply of runnable tasks on the runqueue, this just amounts to spinning in a tight-loop with a cond_resched(). (When running in a fully preemptible kernel, cond_resched() calls are stubbed out so it amounts to even less.) In sum, cond_resched() in error handling/retry contexts might be useful in avoiding softlockup splats, but not very good at error handling. Ideally, these should be replaced with some kind of timed or event wait. For now add cond_resched_stall(), which tries to schedule if possible, and failing that executes a cpu_relax(). Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> --- include/linux/sched.h | 6 ++++++ kernel/sched/core.c | 12 ++++++++++++ 2 files changed, 18 insertions(+)