diff mbox series

mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified

Message ID 6d6fb601-6100-92b9-cea3-e7ebacc7693a@I-love.SAKURA.ne.jp (mailing list archive)
State New
Headers show
Series mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified | expand

Commit Message

Tetsuo Handa May 11, 2023, 1:47 p.m. UTC
Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
observe this flag being set immediately after another thread doing
__GFP_KSWAPD_RECLAIM allocation request set this flag.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Andrew Morton May 12, 2023, 3:45 a.m. UTC | #1
On Thu, 11 May 2023 22:47:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
> observe this flag being set immediately after another thread doing
> __GFP_KSWAPD_RECLAIM allocation request set this flag.

What are the user-visible runtime effects of this flaw?

> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.j
> +++ b/mm/page_alloc.c
> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
>  
>  out:
>  	/* Separate test+clear to avoid unnecessary atomics */
> -	if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
> +	if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))
> +	    && (alloc_flags & ALLOC_KSWAPD)) {
>  		clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
>  		wakeup_kswapd(zone, 0, 0, zone_idx(zone));
>  	}

Thanks, I'll queue this up for some testing while awaiting input from
Mel.
Tetsuo Handa May 13, 2023, 9:38 a.m. UTC | #2
On 2023/05/12 12:45, Andrew Morton wrote:
> On Thu, 11 May 2023 22:47:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
>> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
>> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
>> observe this flag being set immediately after another thread doing
>> __GFP_KSWAPD_RECLAIM allocation request set this flag.
> 
> What are the user-visible runtime effects of this flaw?

Potential deadlock upon __GFP_HIGH (I mean, !__GFP_KSWAPD_RECLAIM)
allocation requests (like debugobject code is about to start doing).

> 
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.j
>> +++ b/mm/page_alloc.c
>> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
>>  
>>  out:
>>  	/* Separate test+clear to avoid unnecessary atomics */
>> -	if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
>> +	if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))
>> +	    && (alloc_flags & ALLOC_KSWAPD)) {
>>  		clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
>>  		wakeup_kswapd(zone, 0, 0, zone_idx(zone));
>>  	}
> 
> Thanks, I'll queue this up for some testing while awaiting input from
> Mel.
>
Mel Gorman May 13, 2023, 10:23 a.m. UTC | #3
On Thu, May 11, 2023 at 10:47:36PM +0900, Tetsuo Handa wrote:
> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
> observe this flag being set immediately after another thread doing
> __GFP_KSWAPD_RECLAIM allocation request set this flag.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")

The issue is real but it needs to be explained why this is a problem.
Only allocation contexts that specify ALLOC_KSWAPD should wake kswapd
similar to this

        if (alloc_flags & ALLOC_KSWAPD)
                wake_all_kswapds(order, gfp_mask, ac);

The consequences are that kswapd could potentially be woken spuriously
for callsites that clear __GFP_KSWAPD_RECLAIM explicitly or implicitly
via combinations like GFP_TRANSHUGE_LIGHT. The other side is that kswapd
does not get woken to reclaim pages up to the boosted watermark
leading to a higher risk of fragmentation that may prevent future
hugepage allocations.

There is a slight risk this will increase reclaim because the zone flag
is not being cleared in as many contexts but the risk is low.

I also suggest as a micro-optimisation that ALLOC_KSWAPD is checked first
because it should be cache hot and cheaper than the shared cache line for
zone flags.
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 47421bedc12b..4283b5916f36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3052,7 +3052,8 @@  struct page *rmqueue(struct zone *preferred_zone,
 
 out:
 	/* Separate test+clear to avoid unnecessary atomics */
-	if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
+	if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))
+	    && (alloc_flags & ALLOC_KSWAPD)) {
 		clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
 		wakeup_kswapd(zone, 0, 0, zone_idx(zone));
 	}