Message ID | 6577e1fa-b6ee-f2be-2414-a2b51b1c5e30@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | debugobject: don't wake up kswapd from fill_pool() | expand |
On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is > (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd. > Since fill_pool() might be called with arbitrary locks held, > fill_pool() should not assume that holding pgdat->kswapd_wait is safe. hm. But many GFP_ATOMIC allocation attempts are made with locks held. Why aren't all such callers buggy, by trying to wake kswapd with locks held? What's special about this one? > Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation > > Reported-by: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com> > Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects") > --- > lib/debugobjects.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/debugobjects.c b/lib/debugobjects.c > index 003edc5ebd67..986adca357b4 100644 > --- a/lib/debugobjects.c > +++ b/lib/debugobjects.c > @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = { > > static void fill_pool(void) > { > - gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN; > + gfp_t gfp = __GFP_HIGH | __GFP_NOWARN; Does this weaken fill_pool()'s allocation attempt more than necessary? We can still pass __GFP_HIGH?
On 2023/05/12 12:44, Andrew Morton wrote: > On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is >> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd. >> Since fill_pool() might be called with arbitrary locks held, >> fill_pool() should not assume that holding pgdat->kswapd_wait is safe. > > hm. But many GFP_ATOMIC allocation attempts are made with locks held. > Why aren't all such callers buggy, by trying to wake kswapd with locks > held? What's special about this one? Because debugobject cannot know what locks are held when fill_pool() does GFP_ATOMIC allocation. syzbot is reporting base->lock => pgdat->kswapd_wait dependency add_timer() { __mod_timer() { base = lock_timer_base(timer, &flags); new_base = get_target_base(base, timer->flags); if (base != new_base) { raw_spin_unlock(&base->lock); base = new_base; raw_spin_lock(&base->lock); } debug_timer_activate(timer) { debug_object_activate(timer, &timer_debug_descr) { debug_objects_fill_pool() { fill_pool() { kmem_cache_zalloc(GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN) { // wakes kswapd } } } } } raw_spin_unlock_irqrestore(&base->lock, flags); } } when pgdat->kswapd_wait => p->pi_lock dependency __alloc_pages() { get_page_from_freelist() { rmqueue() { wakeup_kswapd() { wake_up_interruptible(&pgdat->kswapd_wait) { __wake_up_common_lock() { spin_lock_irqsave(&pgdat->kswapd_wait.lock, flags); __wake_up_common() { autoremove_wake_function() { try_to_wake_up() { raw_spin_lock_irqsave(&p->pi_lock, flags); raw_spin_unlock_irqrestore(&p->pi_lock, flags); } } } spin_unlock_irqrestore(&pgdat->kswapd_wait.lock, flags); } } } } } } and p->pi_lock => rq->__lock => base->lock dependency wake_up_new_task() { raw_spin_lock_irqsave(&p->pi_lock, rf.flags); rq = __task_rq_lock(p, &rf); // acquires rq->lock activate_task(rq, p, ENQUEUE_NOCLOCK) { enqueue_task() { psi_enqueue() { psi_task_change() { queue_delayed_work_on() { __queue_delayed_work() { add_timer() { __mod_timer() { base = lock_timer_base(timer, &flags); // acquires base->lock debug_timer_activate(timer); // possible base->lock => pgdat->kswapd_wait => p->pi_lock dependency raw_spin_unlock_irqrestore(&base->lock, flags); } } } } } } } } task_rq_unlock(rq, p, &rf); } exists. All GFP_ATOMIC allocation users are supposed to be aware of what locks are held, and are supposed to explicitly remove __GFP_KSWAPD_RECLAIM if waking up kswapd can cause deadlock. But reality is that we can't be careful enough to error-free. Who would imagine GFP_ATOMIC allocation while base->lock is held can form circular locking dependency? > >> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation __GFP_NORETRY is not checked by !__GFP_DIRECT_RECLAIM allocation. GFP_ATOMIC - __GFP_KSWAPD_RECLAIM is __GFP_HIGH. >> >> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = { >> >> static void fill_pool(void) >> { >> - gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN; >> + gfp_t gfp = __GFP_HIGH | __GFP_NOWARN; > > Does this weaken fill_pool()'s allocation attempt more than necessary? > We can still pass __GFP_HIGH? What do you mean? I think that killing base->lock => pgdat->kswapd_wait by removing __GFP_KSWAPD_RECLAIM is the right fix. This weakening is needed for avoiding base->lock => pgdat->kswapd_wait dependency from debugobject code. For locking dependency safety, I wish that GFP_ATOMIC / GFP_NOWAIT do not imply __GFP_KSWAPD_RECLAIM. Such allocations should not try to allocate as many pages as even __GFP_HIGH fails. And if such allocations try to allocate as many pages as even __GFP_HIGH fails, they likely already failed before background kswapd reclaim finds some reusable pages....
On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote: > On 2023/05/12 12:44, Andrew Morton wrote: >> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >> >>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is >>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd. >>> Since fill_pool() might be called with arbitrary locks held, >>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe. https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/
On 2023/05/12 21:54, Thomas Gleixner wrote: > On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote: >> On 2023/05/12 12:44, Andrew Morton wrote: >>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >>> >>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is >>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd. >>>> Since fill_pool() might be called with arbitrary locks held, >>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe. > > https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/ .config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock dependency but does not say about db->lock. How can your patch fix this problem?
On Fri, May 12 2023 at 22:09, Tetsuo Handa wrote: > On 2023/05/12 21:54, Thomas Gleixner wrote: >> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote: >>> On 2023/05/12 12:44, Andrew Morton wrote: >>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >>>> >>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is >>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd. >>>>> Since fill_pool() might be called with arbitrary locks held, >>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe. >> >> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/ > > .config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about > base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock > dependency but does not say about db->lock. > > How can your patch fix this problem? It's described in the changelog, no? The main change is to make the refill invocation conditional when the lookup fails. That's how that code has been from day one. The patch which closed the race recently wreckaged those refill oportunities and the fix for that introduced this problem. Thanks, tglx
On 2023/05/13 3:07, Thomas Gleixner wrote: > On Fri, May 12 2023 at 22:09, Tetsuo Handa wrote: >> On 2023/05/12 21:54, Thomas Gleixner wrote: >>> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote: >>>> On 2023/05/12 12:44, Andrew Morton wrote: >>>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >>>>> >>>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is >>>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd. >>>>>> Since fill_pool() might be called with arbitrary locks held, >>>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe. >>> >>> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/ >> >> .config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about >> base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock >> dependency but does not say about db->lock. >> >> How can your patch fix this problem? > > It's described in the changelog, no? I can't find a proof that lookup_object() never returns NULL when debug_object_activate() is called. > > The main change is to make the refill invocation conditional when the > lookup fails. That's how that code has been from day one. Making refill conditional helps reducing frequency of doing allocations. I want a proof that allocations never happens in the worst scenario. Are you saying that some debugobject function other than debug_object_activate() guarantees that memory for that object was already allocated before debug_object_activate() is called for the first time for that object, _and_ such debugobject function is called without locks held? > > The patch which closed the race recently wreckaged those refill > oportunities and the fix for that introduced this problem. > > Thanks, > > tglx
On Sat, May 13 2023 at 08:13, Tetsuo Handa wrote: > On 2023/05/13 3:07, Thomas Gleixner wrote: >> The main change is to make the refill invocation conditional when the >> lookup fails. That's how that code has been from day one. > > Making refill conditional helps reducing frequency of doing allocations. > I want a proof that allocations never happens in the worst scenario. > > Are you saying that some debugobject function other than debug_object_activate() > guarantees that memory for that object was already allocated before > debug_object_activate() is called for the first time for that object, > _and_ such debugobject function is called without locks held? The point is that the allocation in activate() only happens when the tracked entity was not initialized _before_ activate() is invoked. That's a bug for dynamically allocated entities, but a valid scenario for statically initialized entities as they can be activated without prior init() obviously. For dynamically allocated entities the init() function takes care of the tracking object allocation and that's where the pool is refilled. So for those the lookup will never fail. Now I just stared at __alloc_pages_slowpath() and looked at the condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM. So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that wakeup path. Thanks, tglx
On 2023/05/13 17:33, Thomas Gleixner wrote: > Now I just stared at __alloc_pages_slowpath() and looked at the > condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because > debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM. > > So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that > wakeup path. Yes. That is exactly what my patch does.
On Sat, May 13 2023 at 18:33, Tetsuo Handa wrote: > On 2023/05/13 17:33, Thomas Gleixner wrote: >> Now I just stared at __alloc_pages_slowpath() and looked at the >> condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because >> debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM. >> >> So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that >> wakeup path. > > Yes. That is exactly what my patch does. Indeed. For some reason your patch (though you cc'ed me) did not show up in my inbox. I've grabbed it from lore so no need to resend. Actually we want both changes. - Your's to fix the underlying ancient problem. - The one I did which restores the performance behaviour Thanks, tglx
diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 003edc5ebd67..986adca357b4 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = { static void fill_pool(void) { - gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN; + gfp_t gfp = __GFP_HIGH | __GFP_NOWARN; struct debug_obj *obj; unsigned long flags;
syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd. Since fill_pool() might be called with arbitrary locks held, fill_pool() should not assume that holding pgdat->kswapd_wait is safe. Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation. Reported-by: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects") --- lib/debugobjects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)