diff mbox series

[v2] mm/page_alloc: fix alloc_pages_bulk/set_page_owner panic on irq disabled

Message ID 20210712022333.1510-1-link@vivo.com (mailing list archive)
State New
Headers show
Series [v2] mm/page_alloc: fix alloc_pages_bulk/set_page_owner panic on irq disabled | expand

Commit Message

Huan Yang July 12, 2021, 2:23 a.m. UTC
BUG: sleeping function called from invalid context at mm/page_alloc.c:5179
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0

__dump_stack lib/dump_stack.c:79 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153
prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179
__alloc_pages+0x12f/0x500 mm/page_alloc.c:5375
alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303
save_stack+0x15e/0x1e0 mm/page_owner.c:120
__set_page_owner+0x50/0x290 mm/page_owner.c:181
prep_new_page mm/page_alloc.c:2445 [inline]
__alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313

The problem is caused by set_page_owner alloc memory to save stack with
GFP_KERNEL in local_riq disabled.
So, we just can't assume that alloc flags should be same with new page,
prep_new_page should prep/trace the page gfp, but shouldn't use the same
gfp to get memory, let's depend on caller.
So, here is two gfp flags, alloc_gfp used to alloc memory, depend on
caller, page_gfp_mask is page's gfp, used to trace/prep itself
But in most situation, same is ok, in alloc_pages_bulk, use GFP_ATOMIC
is ok.(even if set_page_owner save backtrace failed, limited impact)

v2:
- add more description.

Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
Reported-by: syzbot+b07d8440edb5f8988eea@syzkaller.appspotmail.com
Suggested-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Yang Huan <link@vivo.com>
---
 include/linux/page_owner.h |  8 ++++----
 mm/compaction.c            |  2 +-
 mm/internal.h              |  2 +-
 mm/page_alloc.c            | 21 +++++++++++----------
 mm/page_owner.c            |  6 +++---
 5 files changed, 20 insertions(+), 19 deletions(-)

Comments

Mel Gorman July 15, 2021, 11:57 a.m. UTC | #1
On Mon, Jul 12, 2021 at 10:23:32AM +0800, Yang Huan wrote:
> BUG: sleeping function called from invalid context at mm/page_alloc.c:5179
> in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> 
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
> ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153
> prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179
> __alloc_pages+0x12f/0x500 mm/page_alloc.c:5375
> alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
> stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303
> save_stack+0x15e/0x1e0 mm/page_owner.c:120
> __set_page_owner+0x50/0x290 mm/page_owner.c:181
> prep_new_page mm/page_alloc.c:2445 [inline]
> __alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313
> 
> The problem is caused by set_page_owner alloc memory to save stack with
> GFP_KERNEL in local_riq disabled.
> So, we just can't assume that alloc flags should be same with new page,
> prep_new_page should prep/trace the page gfp, but shouldn't use the same
> gfp to get memory, let's depend on caller.
> So, here is two gfp flags, alloc_gfp used to alloc memory, depend on
> caller, page_gfp_mask is page's gfp, used to trace/prep itself
> But in most situation, same is ok, in alloc_pages_bulk, use GFP_ATOMIC
> is ok.(even if set_page_owner save backtrace failed, limited impact)
> 
> v2:
> - add more description.
> 
> Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
> Reported-by: syzbot+b07d8440edb5f8988eea@syzkaller.appspotmail.com
> Suggested-by: Wang Qing <wangqing@vivo.com>
> Signed-off-by: Yang Huan <link@vivo.com>

https://lore.kernel.org/lkml/20210713152100.10381-2-mgorman@techsingularity.net/
is now part of a series that has being sent to Linus. Hence, the Fixes
part is no longer applicable and the patch will no longer be addresing
an atomic sleep bug.  This patch should be treated as an enhancement
to allow bulk allocations when PAGE_OWNER is set. For that, it should
include a note on the performance if PAGE_OWNER is used with either NFS
or high-speed networking to justify the additional complexity.
Huan Yang July 15, 2021, 1:02 p.m. UTC | #2
>> BUG: sleeping function called from invalid context at mm/page_alloc.c:5179
>> in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
>> 
>> __dump_stack lib/dump_stack.c:79 [inline]
>> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
>> ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153
>> prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179
>> __alloc_pages+0x12f/0x500 mm/page_alloc.c:5375
>> alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
>> stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303
>> save_stack+0x15e/0x1e0 mm/page_owner.c:120
>> __set_page_owner+0x50/0x290 mm/page_owner.c:181
>> prep_new_page mm/page_alloc.c:2445 [inline]
>> __alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313
>> 
>> The problem is caused by set_page_owner alloc memory to save stack with
>> GFP_KERNEL in local_riq disabled.
>> So, we just can't assume that alloc flags should be same with new page,
>> prep_new_page should prep/trace the page gfp, but shouldn't use the same
>> gfp to get memory, let's depend on caller.
>> So, here is two gfp flags, alloc_gfp used to alloc memory, depend on
>> caller, page_gfp_mask is page's gfp, used to trace/prep itself
>> But in most situation, same is ok, in alloc_pages_bulk, use GFP_ATOMIC
>> is ok.(even if set_page_owner save backtrace failed, limited impact)
>> 
>> v2:
>> - add more description.
>> 
>> Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
>> Reported-by: syzbot+b07d8440edb5f8988eea@syzkaller.appspotmail.com
>> Suggested-by: Wang Qing <wangqing@vivo.com>
>> Signed-off-by: Yang Huan <link@vivo.com>
>
>https://lore.kernel.org/lkml/20210713152100.10381-2-mgorman@techsingularity.net/
>is now part of a series that has being sent to Linus. Hence, the Fixes
>part is no longer applicable and the patch will no longer be addresing
>an atomic sleep bug.  This patch should be treated as an enhancement
Hi Mel Gorman, thanks for your reply.
I see the fix patch, it fix this bug by abandon alloc bulk feature when page_owner is set. 
But in my opinion, it can't really fix this bug, it's a circumvention plan.
>to allow bulk allocations when PAGE_OWNER is set. For that, it should
>include a note on the performance if PAGE_OWNER is used with either NFS
>or high-speed networking to justify the additional complexity.
My patch just split the prep_new_page page_gfp into alloc_gfp(for alloc bulk is GFP_ATOMIC,
for other's no change) and trace page gfp.  So, we will not use the error way to get memory. 
So, I think this will not affect alloc bulk performance when page_owner is on(compare with origin patch) but
can really fix this  bug rather than evade.
And this patch can let alloc bulk feature and page_owner feature work togher
So, I will send patch again based on the fix patch.
Thank you
Yang Huan
>
>-- 
>Mel Gorman
>SUSE Labs
Mel Gorman July 15, 2021, 2:05 p.m. UTC | #3
On Thu, Jul 15, 2021 at 09:02:14PM +0800, ?????? wrote:
> >> Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
> >> Reported-by: syzbot+b07d8440edb5f8988eea@syzkaller.appspotmail.com
> >> Suggested-by: Wang Qing <wangqing@vivo.com>
> >> Signed-off-by: Yang Huan <link@vivo.com>
> >
> >https://lore.kernel.org/lkml/20210713152100.10381-2-mgorman@techsingularity.net/
> >is now part of a series that has being sent to Linus. Hence, the Fixes
> >part is no longer applicable and the patch will no longer be addresing
> >an atomic sleep bug.  This patch should be treated as an enhancement
>
> Hi Mel Gorman, thanks for your reply.
> I see the fix patch, it fix this bug by abandon alloc bulk feature when page_owner is set. 
> But in my opinion, it can't really fix this bug, it's a circumvention plan.

Yes, it's a circumvention plan for reasons as laid out in the changelog.

> >to allow bulk allocations when PAGE_OWNER is set. For that, it should
> >include a note on the performance if PAGE_OWNER is used with either NFS
> >or high-speed networking to justify the additional complexity.
>
> My patch just split the prep_new_page page_gfp into alloc_gfp(for alloc bulk is GFP_ATOMIC,
> for other's no change) and trace page gfp.  So, we will not use the error way to get memory. 
> So, I think this will not affect alloc bulk performance when page_owner is on(compare with origin patch) but
> can really fix this  bug rather than evade.
> And this patch can let alloc bulk feature and page_owner feature work togher
> So, I will send patch again based on the fix patch.

Your fix should revert the workaround. Also your changelog should note
that in some cases that PAGE_OWNER information will be lost if the
GFP_ATOMIC allocation from bulk allocation context fails.
Huan Yang July 15, 2021, 2:12 p.m. UTC | #4
>> >> Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator")
>> >> Reported-by: syzbot+b07d8440edb5f8988eea@syzkaller.appspotmail.com
>> >> Suggested-by: Wang Qing <wangqing@vivo.com>
>> >> Signed-off-by: Yang Huan <link@vivo.com>
>> >
>> >https://lore.kernel.org/lkml/20210713152100.10381-2-mgorman@techsingularity.net/
>> >is now part of a series that has being sent to Linus. Hence, the Fixes
>> >part is no longer applicable and the patch will no longer be addresing
>> >an atomic sleep bug.  This patch should be treated as an enhancement
>>
>> Hi Mel Gorman, thanks for your reply.
>> I see the fix patch, it fix this bug by abandon alloc bulk feature when page_owner is set. 
>> But in my opinion, it can't really fix this bug, it's a circumvention plan.
>
>Yes, it's a circumvention plan for reasons as laid out in the changelog.
>
>> >to allow bulk allocations when PAGE_OWNER is set. For that, it should
>> >include a note on the performance if PAGE_OWNER is used with either NFS
>> >or high-speed networking to justify the additional complexity.
>>
>> My patch just split the prep_new_page page_gfp into alloc_gfp(for alloc bulk is GFP_ATOMIC,
>> for other's no change) and trace page gfp.  So, we will not use the error way to get memory. 
>> So, I think this will not affect alloc bulk performance when page_owner is on(compare with origin patch) but
>> can really fix this  bug rather than evade.
>> And this patch can let alloc bulk feature and page_owner feature work togher
>> So, I will send patch again based on the fix patch.
>
>Your fix should revert the workaround. Also your changelog should note
>that in some cases that PAGE_OWNER information will be lost if the
>GFP_ATOMIC allocation from bulk allocation context fails.
Thanks, I will note that. 
>
>-- 
>Mel Gorman
>SUSE Labs
diff mbox series

Patch

diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 3468794f83d2..c930a63e149b 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -10,7 +10,7 @@  extern struct page_ext_operations page_owner_ops;
 
 extern void __reset_page_owner(struct page *page, unsigned int order);
 extern void __set_page_owner(struct page *page,
-			unsigned int order, gfp_t gfp_mask);
+			unsigned int order, gfp_t alloc_gfp, gfp_t page_gfp_mask);
 extern void __split_page_owner(struct page *page, unsigned int nr);
 extern void __copy_page_owner(struct page *oldpage, struct page *newpage);
 extern void __set_page_owner_migrate_reason(struct page *page, int reason);
@@ -25,10 +25,10 @@  static inline void reset_page_owner(struct page *page, unsigned int order)
 }
 
 static inline void set_page_owner(struct page *page,
-			unsigned int order, gfp_t gfp_mask)
+			unsigned int order, gfp_t alloc_gfp, gfp_t page_gfp_mask)
 {
 	if (static_branch_unlikely(&page_owner_inited))
-		__set_page_owner(page, order, gfp_mask);
+		__set_page_owner(page, order, alloc_gfp, page_gfp_mask);
 }
 
 static inline void split_page_owner(struct page *page, unsigned int nr)
@@ -56,7 +56,7 @@  static inline void reset_page_owner(struct page *page, unsigned int order)
 {
 }
 static inline void set_page_owner(struct page *page,
-			unsigned int order, gfp_t gfp_mask)
+			unsigned int order, gfp_t alloc_gfp, gfp_t page_gfp_mask)
 {
 }
 static inline void split_page_owner(struct page *page,
diff --git a/mm/compaction.c b/mm/compaction.c
index 84fde270ae74..a3bc69dceb1d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -96,7 +96,7 @@  static void split_map_pages(struct list_head *list)
 		order = page_private(page);
 		nr_pages = 1 << order;
 
-		post_alloc_hook(page, order, __GFP_MOVABLE);
+		post_alloc_hook(page, order, __GFP_MOVABLE, __GFP_MOVABLE);
 		if (order)
 			split_page(page, order);
 
diff --git a/mm/internal.h b/mm/internal.h
index e8fdb531f887..9d0cd0840f58 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -195,7 +195,7 @@  extern void memblock_free_pages(struct page *page, unsigned long pfn,
 extern void __free_pages_core(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned int order);
 extern void post_alloc_hook(struct page *page, unsigned int order,
-					gfp_t gfp_flags);
+					gfp_t alloc_gfp, gfp_t page_gfp_mask);
 extern int user_min_free_kbytes;
 
 extern void free_unref_page(struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1f5de1c1283..bdd057e20376 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2322,7 +2322,7 @@  static bool check_new_pages(struct page *page, unsigned int order)
 }
 
 inline void post_alloc_hook(struct page *page, unsigned int order,
-				gfp_t gfp_flags)
+				gfp_t alloc_gfp, gfp_t page_gfp_mask)
 {
 	bool init;
 
@@ -2344,20 +2344,21 @@  inline void post_alloc_hook(struct page *page, unsigned int order,
 	 * kasan_alloc_pages and kernel_init_free_pages must be
 	 * kept together to avoid discrepancies in behavior.
 	 */
-	init = !want_init_on_free() && want_init_on_alloc(gfp_flags);
+	init = !want_init_on_free() && want_init_on_alloc(page_gfp_mask);
 	kasan_alloc_pages(page, order, init);
 	if (init && !kasan_has_integrated_init())
 		kernel_init_free_pages(page, 1 << order);
 
-	set_page_owner(page, order, gfp_flags);
+	set_page_owner(page, order, alloc_gfp, page_gfp_mask);
 }
 
-static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
-							unsigned int alloc_flags)
+static void prep_new_page(struct page *page, unsigned int order,
+			  gfp_t alloc_gfp, gfp_t page_gfp_mask,
+			  unsigned int alloc_flags)
 {
-	post_alloc_hook(page, order, gfp_flags);
+	post_alloc_hook(page, order, alloc_gfp, page_gfp_mask);
 
-	if (order && (gfp_flags & __GFP_COMP))
+	if (order && (page_gfp_mask & __GFP_COMP))
 		prep_compound_page(page, order);
 
 	/*
@@ -3991,7 +3992,7 @@  get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 		page = rmqueue(ac->preferred_zoneref->zone, zone, order,
 				gfp_mask, alloc_flags, ac->migratetype);
 		if (page) {
-			prep_new_page(page, order, gfp_mask, alloc_flags);
+			prep_new_page(page, order, gfp_mask, gfp_mask, alloc_flags);
 
 			/*
 			 * If this is a high-order atomic allocation then check
@@ -4211,7 +4212,7 @@  __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 	/* Prep a captured page if available */
 	if (page)
-		prep_new_page(page, order, gfp_mask, alloc_flags);
+		prep_new_page(page, order, gfp_mask, gfp_mask, alloc_flags);
 
 	/* Try get a page from the freelist if available */
 	if (!page)
@@ -5127,7 +5128,7 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 		__count_zid_vm_events(PGALLOC, zone_idx(zone), 1);
 		zone_statistics(ac.preferred_zoneref->zone, zone);
 
-		prep_new_page(page, 0, gfp, 0);
+		prep_new_page(page, 0, GFP_ATOMIC, gfp, 0);
 		if (page_list)
 			list_add(&page->lru, page_list);
 		else
diff --git a/mm/page_owner.c b/mm/page_owner.c
index adfabb560eb9..22948724ca64 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -170,7 +170,7 @@  static inline void __set_page_owner_handle(struct page_ext *page_ext,
 }
 
 noinline void __set_page_owner(struct page *page, unsigned int order,
-					gfp_t gfp_mask)
+					gfp_t alloc_gfp, gfp_t page_gfp_mask)
 {
 	struct page_ext *page_ext = lookup_page_ext(page);
 	depot_stack_handle_t handle;
@@ -178,8 +178,8 @@  noinline void __set_page_owner(struct page *page, unsigned int order,
 	if (unlikely(!page_ext))
 		return;
 
-	handle = save_stack(gfp_mask);
-	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
+	handle = save_stack(alloc_gfp);
+	__set_page_owner_handle(page_ext, handle, order, page_gfp_mask);
 }
 
 void __set_page_owner_migrate_reason(struct page *page, int reason)