debugobject: don't wake up kswapd from fill_pool()

Message ID	6577e1fa-b6ee-f2be-2414-a2b51b1c5e30@I-love.SAKURA.ne.jp (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> Message-ID: <6577e1fa-b6ee-f2be-2414-a2b51b1c5e30@I-love.SAKURA.ne.jp> Date: Thu, 11 May 2023 22:47:32 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: [PATCH] debugobject: don't wake up kswapd from fill_pool() Content-Language: en-US To: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com>, syzkaller-bugs@googlegroups.com, Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>, Andrew Morton <akpm@linux-foundation.org> References: <000000000000008ddb05fb5e2576@google.com> Cc: linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> In-Reply-To: <000000000000008ddb05fb5e2576@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	debugobject: don't wake up kswapd from fill_pool() \| expand debugobject: don't wake up kswapd from fill_pool()

Tetsuo Handa May 11, 2023, 1:47 p.m. UTC

syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
(__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
Since fill_pool() might be called with arbitrary locks held,
fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation.

Reported-by: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
---
 lib/debugobjects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Andrew Morton May 12, 2023, 3:44 a.m. UTC | #1

On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
> Since fill_pool() might be called with arbitrary locks held,
> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

hm.  But many GFP_ATOMIC allocation attempts are made with locks held. 
Why aren't all such callers buggy, by trying to wake kswapd with locks
held?  What's special about this one?

> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation
> 
> Reported-by: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com>
> Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
> ---
>  lib/debugobjects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index 003edc5ebd67..986adca357b4 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
>  
>  static void fill_pool(void)
>  {
> -	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
> +	gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;

Does this weaken fill_pool()'s allocation attempt more than necessary? 
We can still pass __GFP_HIGH?

Tetsuo Handa May 12, 2023, 10:57 a.m. UTC | #2

On 2023/05/12 12:44, Andrew Morton wrote:
> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>> Since fill_pool() might be called with arbitrary locks held,
>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
> 
> hm.  But many GFP_ATOMIC allocation attempts are made with locks held. 
> Why aren't all such callers buggy, by trying to wake kswapd with locks
> held?  What's special about this one?

Because debugobject cannot know what locks are held when fill_pool() does
GFP_ATOMIC allocation.

syzbot is reporting base->lock => pgdat->kswapd_wait dependency

  add_timer() {
    __mod_timer() {
      base = lock_timer_base(timer, &flags);
      new_base = get_target_base(base, timer->flags);
      if (base != new_base) {
        raw_spin_unlock(&base->lock);
        base = new_base;
        raw_spin_lock(&base->lock);
      }
      debug_timer_activate(timer) {
        debug_object_activate(timer, &timer_debug_descr) {
          debug_objects_fill_pool() {
            fill_pool() {
              kmem_cache_zalloc(GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN) {
                // wakes kswapd
              }
            }
          }
        }
      }
      raw_spin_unlock_irqrestore(&base->lock, flags);
    }
  }

when pgdat->kswapd_wait => p->pi_lock dependency

  __alloc_pages() {
    get_page_from_freelist() {
      rmqueue() {
        wakeup_kswapd() {
          wake_up_interruptible(&pgdat->kswapd_wait) {
            __wake_up_common_lock() {
              spin_lock_irqsave(&pgdat->kswapd_wait.lock, flags);
              __wake_up_common() {
                autoremove_wake_function() {
                  try_to_wake_up() {
                    raw_spin_lock_irqsave(&p->pi_lock, flags);
                    raw_spin_unlock_irqrestore(&p->pi_lock, flags);
                  }
                }
              }
              spin_unlock_irqrestore(&pgdat->kswapd_wait.lock, flags);
            }
          }
        }
      }
    }
  }

and p->pi_lock => rq->__lock => base->lock dependency

  wake_up_new_task() {
    raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
    rq = __task_rq_lock(p, &rf); // acquires rq->lock
    activate_task(rq, p, ENQUEUE_NOCLOCK) {
      enqueue_task() {
        psi_enqueue() {
          psi_task_change() {
            queue_delayed_work_on() {
              __queue_delayed_work() {
                add_timer() {
                  __mod_timer() {
                    base = lock_timer_base(timer, &flags); // acquires base->lock
                    debug_timer_activate(timer); // possible base->lock => pgdat->kswapd_wait => p->pi_lock dependency
                    raw_spin_unlock_irqrestore(&base->lock, flags);
                  }
                }
              }
            }
          }
        }
      }
    }
    task_rq_unlock(rq, p, &rf);
  }

exists.

All GFP_ATOMIC allocation users are supposed to be aware of what locks
are held, and are supposed to explicitly remove __GFP_KSWAPD_RECLAIM
if waking up kswapd can cause deadlock. But reality is that we can't be
careful enough to error-free. Who would imagine GFP_ATOMIC allocation
while base->lock is held can form circular locking dependency?

> 
>> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation

__GFP_NORETRY is not checked by !__GFP_DIRECT_RECLAIM allocation.
GFP_ATOMIC - __GFP_KSWAPD_RECLAIM is __GFP_HIGH.

>>
>> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
>>  
>>  static void fill_pool(void)
>>  {
>> -	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
>> +	gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;
> 
> Does this weaken fill_pool()'s allocation attempt more than necessary? 
> We can still pass __GFP_HIGH?

What do you mean? I think that killing base->lock => pgdat->kswapd_wait
by removing __GFP_KSWAPD_RECLAIM is the right fix. This weakening is
needed for avoiding base->lock => pgdat->kswapd_wait dependency from
debugobject code.

For locking dependency safety, I wish that GFP_ATOMIC / GFP_NOWAIT do not imply
__GFP_KSWAPD_RECLAIM. Such allocations should not try to allocate as many pages
as even __GFP_HIGH fails. And if such allocations try to allocate as many pages
as even __GFP_HIGH fails, they likely already failed before background kswapd
reclaim finds some reusable pages....

Thomas Gleixner May 12, 2023, 12:54 p.m. UTC | #3

On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
> On 2023/05/12 12:44, Andrew Morton wrote:
>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>> 
>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>> Since fill_pool() might be called with arbitrary locks held,
>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/

Tetsuo Handa May 12, 2023, 1:09 p.m. UTC | #4

On 2023/05/12 21:54, Thomas Gleixner wrote:
> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
>> On 2023/05/12 12:44, Andrew Morton wrote:
>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>
>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>>> Since fill_pool() might be called with arbitrary locks held,
>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
> 
> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/

.config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about
base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock
dependency but does not say about db->lock.

How can your patch fix this problem?

Thomas Gleixner May 12, 2023, 6:07 p.m. UTC | #5

On Fri, May 12 2023 at 22:09, Tetsuo Handa wrote:
> On 2023/05/12 21:54, Thomas Gleixner wrote:
>> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
>>> On 2023/05/12 12:44, Andrew Morton wrote:
>>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>>
>>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>>>> Since fill_pool() might be called with arbitrary locks held,
>>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
>> 
>> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/
>
> .config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about
> base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock
> dependency but does not say about db->lock.
>
> How can your patch fix this problem?

It's described in the changelog, no?

The main change is to make the refill invocation conditional when the
lookup fails. That's how that code has been from day one.

The patch which closed the race recently wreckaged those refill
oportunities and the fix for that introduced this problem.

Thanks,

        tglx

Tetsuo Handa May 12, 2023, 11:13 p.m. UTC | #6

On 2023/05/13 3:07, Thomas Gleixner wrote:
> On Fri, May 12 2023 at 22:09, Tetsuo Handa wrote:
>> On 2023/05/12 21:54, Thomas Gleixner wrote:
>>> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
>>>> On 2023/05/12 12:44, Andrew Morton wrote:
>>>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>>>
>>>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>>>>> Since fill_pool() might be called with arbitrary locks held,
>>>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
>>>
>>> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/
>>
>> .config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about
>> base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock
>> dependency but does not say about db->lock.
>>
>> How can your patch fix this problem?
> 
> It's described in the changelog, no?

I can't find a proof that lookup_object() never returns NULL
when debug_object_activate() is called.

> 
> The main change is to make the refill invocation conditional when the
> lookup fails. That's how that code has been from day one.

Making refill conditional helps reducing frequency of doing allocations.
I want a proof that allocations never happens in the worst scenario.

Are you saying that some debugobject function other than debug_object_activate()
guarantees that memory for that object was already allocated before
debug_object_activate() is called for the first time for that object,
_and_ such debugobject function is called without locks held?

> 
> The patch which closed the race recently wreckaged those refill
> oportunities and the fix for that introduced this problem.
> 
> Thanks,
> 
>         tglx

Thomas Gleixner May 13, 2023, 8:33 a.m. UTC | #7

On Sat, May 13 2023 at 08:13, Tetsuo Handa wrote:
> On 2023/05/13 3:07, Thomas Gleixner wrote:
>> The main change is to make the refill invocation conditional when the
>> lookup fails. That's how that code has been from day one.
>
> Making refill conditional helps reducing frequency of doing allocations.
> I want a proof that allocations never happens in the worst scenario.
>
> Are you saying that some debugobject function other than debug_object_activate()
> guarantees that memory for that object was already allocated before
> debug_object_activate() is called for the first time for that object,
> _and_ such debugobject function is called without locks held?

The point is that the allocation in activate() only happens when the
tracked entity was not initialized _before_ activate() is invoked.

That's a bug for dynamically allocated entities, but a valid scenario
for statically initialized entities as they can be activated without
prior init() obviously.

For dynamically allocated entities the init() function takes care of the
tracking object allocation and that's where the pool is refilled. So for
those the lookup will never fail.

Now I just stared at __alloc_pages_slowpath() and looked at the
condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because
debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM.

So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that
wakeup path.

Thanks,

        tglx

Tetsuo Handa May 13, 2023, 9:33 a.m. UTC | #8

On 2023/05/13 17:33, Thomas Gleixner wrote:
> Now I just stared at __alloc_pages_slowpath() and looked at the
> condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because
> debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM.
> 
> So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that
> wakeup path.

Yes. That is exactly what my patch does.

Thomas Gleixner May 13, 2023, 7:42 p.m. UTC | #9

On Sat, May 13 2023 at 18:33, Tetsuo Handa wrote:
> On 2023/05/13 17:33, Thomas Gleixner wrote:
>> Now I just stared at __alloc_pages_slowpath() and looked at the
>> condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because
>> debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM.
>> 
>> So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that
>> wakeup path.
>
> Yes. That is exactly what my patch does.

Indeed. For some reason your patch (though you cc'ed me) did not show up
in my inbox. I've grabbed it from lore so no need to resend.

Actually we want both changes.

  - Your's to fix the underlying ancient problem.

  - The one I did which restores the performance behaviour

Thanks,

        tglx

debugobject: don't wake up kswapd from fill_pool()

Commit Message

Comments

Patch