Message ID | c3c3dacf-dd3b-77c9-f96a-d0982b4b2a4f@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified | expand |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. > > Only allocation contexts that include ALLOC_KSWAPD (which corresponds to > __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to > remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a > risk of deadlock. But since zone->flags is a shared variable, a thread > doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag > being set immediately after another thread doing __GFP_KSWAPD_RECLAIM > allocation request set this flag, causing possibility of deadlock. Sorry, I don't understand what is the deadlock here. I checked commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") and the corresponding mail thread. From the below mail, https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ commit 73444bc4d8f9 fixed a circular locking ordering as follows, pi lock -> rq lock -> timer base lock -> zone lock -> wakeup lock (kswapd_wait, fixed) -> pi lock But I don't know what is the deadlock that your patch fixed. Can you teach me on that? Best Regards, Huang, Ying > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") > --- > Changes in v2: > Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update > description, suggested by Mel Gorman <mgorman@techsingularity.net>. > > mm/page_alloc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 47421bedc12b..ecad680cec53 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, > > out: > /* Separate test+clear to avoid unnecessary atomics */ > - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { > + if ((alloc_flags & ALLOC_KSWAPD) && > + unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { > clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); > wakeup_kswapd(zone, 0, 0, zone_idx(zone)); > }
Hi, Tetsuo, "Huang, Ying" <ying.huang@intel.com> writes: > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > >> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >> using ZONE_BOOSTED_WATERMARK flag. >> >> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >> risk of deadlock. But since zone->flags is a shared variable, a thread >> doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag >> being set immediately after another thread doing __GFP_KSWAPD_RECLAIM >> allocation request set this flag, causing possibility of deadlock. > > Sorry, I don't understand what is the deadlock here. > > I checked commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with > zone lock held") and the corresponding mail thread. From the below > mail, > > https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ > > commit 73444bc4d8f9 fixed a circular locking ordering as follows, > > pi lock -> rq lock -> timer base lock -> zone lock -> wakeup lock > (kswapd_wait, fixed) -> pi lock > > But I don't know what is the deadlock that your patch fixed. Can you > teach me on that? Just read your email in another thread related to this patch as follow, https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ Is that the deadlock that you tried to fix in this patch? It appears that commit 73444bc4d8f9 didn't fix the deadlock above. It just convert the circular locking ordering to, pi lock -> rq lock -> timer base lock -> wakeup lock (kswapd_wait, fixed) -> pi lock If so, I think that it's better to add corresponding information in patch description to avoid the possible confusion. Best Regards, Huang, Ying >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> >> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") >> --- >> Changes in v2: >> Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update >> description, suggested by Mel Gorman <mgorman@techsingularity.net>. >> >> mm/page_alloc.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 47421bedc12b..ecad680cec53 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, >> >> out: >> /* Separate test+clear to avoid unnecessary atomics */ >> - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { >> + if ((alloc_flags & ALLOC_KSWAPD) && >> + unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { >> clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); >> wakeup_kswapd(zone, 0, 0, zone_idx(zone)); >> }
On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. > > Only allocation contexts that include ALLOC_KSWAPD (which corresponds to > __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to > remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a > risk of deadlock. kswapd_wait is a waitqueue so what is being held? It's safe for kswapd to try wake itself as the waitqueue will be active when wakeup_kswapd() is called so no wakeup occurs. If there is a deadlock, it needs a better explanation. I believe I already stated why this patch is fixing a bug but it wasn't deadlock related.
On 2023/05/15 16:38, Mel Gorman wrote: > On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: >> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >> using ZONE_BOOSTED_WATERMARK flag. >> >> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >> risk of deadlock. > > kswapd_wait is a waitqueue so what is being held? It's safe for kswapd > to try wake itself as the waitqueue will be active when wakeup_kswapd() > is called so no wakeup occurs. If there is a deadlock, it needs a better > explanation. I believe I already stated why this patch is fixing a bug > but it wasn't deadlock related. > I noticed this problem ( pgdat->kswapd_wait might be held without __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 . I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name. The latter was explained at https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ . This patch is for making sure that debugobject code will not try to hold pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM. Thus, the problem this patch will fix is a deadlock related, isn't it?
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > On 2023/05/15 16:38, Mel Gorman wrote: >> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: >>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >>> using ZONE_BOOSTED_WATERMARK flag. >>> >>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >>> risk of deadlock. >> >> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd >> to try wake itself as the waitqueue will be active when wakeup_kswapd() >> is called so no wakeup occurs. If there is a deadlock, it needs a better >> explanation. I believe I already stated why this patch is fixing a bug >> but it wasn't deadlock related. >> > > I noticed this problem ( pgdat->kswapd_wait might be held without > __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code > is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at > https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 . > > I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name. This has confused me much before. IIUC, the deadlock is unrelated with kswapd wakeup itself. pgdat->kswapd_wait is the lock name and the lock in fact is the spinlock: pgdat->kswapd_wait.lock. So the deadlock is, pgdat->kswapd_wait.lock holders take the pi lock pi lock holders take the rq lock rq lock holders take the timer base lock timer base lock holders take the pgdat->kswapd_wait.lock (missing __GFP_KSWAPD_RECLAIM check) The above is based on analysis in https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the circular dependency chain. > The latter was explained at > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ > and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at > https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ . > > This patch is for making sure that debugobject code will not try to hold > pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM. > > Thus, the problem this patch will fix is a deadlock related, isn't it? Best Regards, Huang, Ying
On 2023/05/16 10:44, Huang, Ying wrote: > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > >> On 2023/05/15 16:38, Mel Gorman wrote: >>> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: >>>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >>>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >>>> using ZONE_BOOSTED_WATERMARK flag. >>>> >>>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >>>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >>>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >>>> risk of deadlock. >>> >>> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd >>> to try wake itself as the waitqueue will be active when wakeup_kswapd() >>> is called so no wakeup occurs. If there is a deadlock, it needs a better >>> explanation. I believe I already stated why this patch is fixing a bug >>> but it wasn't deadlock related. >>> >> >> I noticed this problem ( pgdat->kswapd_wait might be held without >> __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code >> is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at >> https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 . >> >> I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name. > > This has confused me much before. IIUC, the deadlock is unrelated with > kswapd wakeup itself. pgdat->kswapd_wait is the lock name and the lock > in fact is the spinlock: pgdat->kswapd_wait.lock. So the deadlock is, > > pgdat->kswapd_wait.lock holders take the pi lock > pi lock holders take the rq lock > rq lock holders take the timer base lock > timer base lock holders take the pgdat->kswapd_wait.lock (missing __GFP_KSWAPD_RECLAIM check) Yes, thank you for explanation. > > The above is based on analysis in > > https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ > > Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base > lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the > circular dependency chain. Yes. Mel, any questions on this patch? Thomas Gleixner described this lock as kswapd_wait::lock at https://lkml.kernel.org/r/168476016890.404.6911447269153588182.tip-bot2@tip-bot2 . Should I resubmit this patch with s/pgdat->kswapd_wait/pgdat->kswapd_wait.lock/ or s/pgdat->kswapd_wait/kswapd_wait::lock/ ? > >> The latter was explained at >> https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ >> and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at >> https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ . >> >> This patch is for making sure that debugobject code will not try to hold >> pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM. >> >> Thus, the problem this patch will fix is a deadlock related, isn't it? > > Best Regards, > Huang, Ying
On Mon, May 22, 2023 at 10:57:03PM +0900, Tetsuo Handa wrote: > On 2023/05/16 10:44, Huang, Ying wrote: > > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > > > > > > > The above is based on analysis in > > > > https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ > > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ > > > > Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base > > lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the > > circular dependency chain. > > Yes. Mel, any questions on this patch? > No, I do not, the deadlock issue is more clear now. > Thomas Gleixner described this lock as kswapd_wait::lock at > https://lkml.kernel.org/r/168476016890.404.6911447269153588182.tip-bot2@tip-bot2 . > Should I resubmit this patch with s/pgdat->kswapd_wait/pgdat->kswapd_wait.lock/ or > s/pgdat->kswapd_wait/kswapd_wait::lock/ ? > I don't think a revision of what's in Andrew's tree is necessary. It could be improved by outlining the exact deadlock similar based on this thread but it's somewhat overkill as the fix has other obvious justifications on its own. Acked-by: Mel Gorman <mgorman@techsingularity.net> Thanks.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b..ecad680cec53 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, out: /* Separate test+clear to avoid unnecessary atomics */ - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { + if ((alloc_flags & ALLOC_KSWAPD) && + unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); wakeup_kswapd(zone, 0, 0, zone_idx(zone)); }
Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() using ZONE_BOOSTED_WATERMARK flag. Only allocation contexts that include ALLOC_KSWAPD (which corresponds to __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a risk of deadlock. But since zone->flags is a shared variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag being set immediately after another thread doing __GFP_KSWAPD_RECLAIM allocation request set this flag, causing possibility of deadlock. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") --- Changes in v2: Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update description, suggested by Mel Gorman <mgorman@techsingularity.net>. mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)