Message ID | 6d6fb601-6100-92b9-cea3-e7ebacc7693a@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified | expand |
On Thu, 11 May 2023 22:47:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared > variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might > observe this flag being set immediately after another thread doing > __GFP_KSWAPD_RECLAIM allocation request set this flag. What are the user-visible runtime effects of this flaw? > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.j > +++ b/mm/page_alloc.c > @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, > > out: > /* Separate test+clear to avoid unnecessary atomics */ > - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { > + if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) > + && (alloc_flags & ALLOC_KSWAPD)) { > clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); > wakeup_kswapd(zone, 0, 0, zone_idx(zone)); > } Thanks, I'll queue this up for some testing while awaiting input from Mel.
On 2023/05/12 12:45, Andrew Morton wrote: > On Thu, 11 May 2023 22:47:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared >> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might >> observe this flag being set immediately after another thread doing >> __GFP_KSWAPD_RECLAIM allocation request set this flag. > > What are the user-visible runtime effects of this flaw? Potential deadlock upon __GFP_HIGH (I mean, !__GFP_KSWAPD_RECLAIM) allocation requests (like debugobject code is about to start doing). > >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.j >> +++ b/mm/page_alloc.c >> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, >> >> out: >> /* Separate test+clear to avoid unnecessary atomics */ >> - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { >> + if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) >> + && (alloc_flags & ALLOC_KSWAPD)) { >> clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); >> wakeup_kswapd(zone, 0, 0, zone_idx(zone)); >> } > > Thanks, I'll queue this up for some testing while awaiting input from > Mel. >
On Thu, May 11, 2023 at 10:47:36PM +0900, Tetsuo Handa wrote: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared > variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might > observe this flag being set immediately after another thread doing > __GFP_KSWAPD_RECLAIM allocation request set this flag. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") The issue is real but it needs to be explained why this is a problem. Only allocation contexts that specify ALLOC_KSWAPD should wake kswapd similar to this if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac); The consequences are that kswapd could potentially be woken spuriously for callsites that clear __GFP_KSWAPD_RECLAIM explicitly or implicitly via combinations like GFP_TRANSHUGE_LIGHT. The other side is that kswapd does not get woken to reclaim pages up to the boosted watermark leading to a higher risk of fragmentation that may prevent future hugepage allocations. There is a slight risk this will increase reclaim because the zone flag is not being cleared in as many contexts but the risk is low. I also suggest as a micro-optimisation that ALLOC_KSWAPD is checked first because it should be cache hot and cheaper than the shared cache line for zone flags.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b..4283b5916f36 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, out: /* Separate test+clear to avoid unnecessary atomics */ - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { + if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) + && (alloc_flags & ALLOC_KSWAPD)) { clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); wakeup_kswapd(zone, 0, 0, zone_idx(zone)); }
Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag being set immediately after another thread doing __GFP_KSWAPD_RECLAIM allocation request set this flag. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") --- mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)