Message ID | 1570090257-25001-1-git-send-email-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/page_alloc: Add a reason for reserved pages in has_unmovable_pages() | expand |
> On Oct 3, 2019, at 4:10 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: > > Having unmovable pages on a given pageblock should be reported correctly > when required with REPORT_FAILURE flag. But there can be a scenario where a > reserved page in the page block will get reported as a generic "unmovable" > reason code. Instead this should be changed to a more appropriate reason > code like "Reserved page". Isn’t this redundant as it dumps the flags in dump_page() anyway? > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Mike Rapoport <rppt@linux.ibm.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > --- > mm/page_alloc.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 15c2050c629b..fbf93ea119d2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > > page = pfn_to_page(check); > > - if (PageReserved(page)) > + if (PageReserved(page)) { > + reason = "Reserved page"; > goto unmovable; > + } > > /* > * If the zone is movable and we have ruled out all reserved > -- > 2.20.1 >
On 10/03/2019 02:35 PM, Qian Cai wrote: > > >> On Oct 3, 2019, at 4:10 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >> >> Having unmovable pages on a given pageblock should be reported correctly >> when required with REPORT_FAILURE flag. But there can be a scenario where a >> reserved page in the page block will get reported as a generic "unmovable" >> reason code. Instead this should be changed to a more appropriate reason >> code like "Reserved page". > > Isn’t this redundant as it dumps the flags in dump_page() anyway? Even though page flags does contain reserved bit information, the problem is that we are explicitly printing the reason for this page dump. In this case it is caused by the fact that it is a reserved page. page dumped because: <reason> The proposed change makes it explicit that the dump is caused because a non movable page with reserved bit set. It also helps in differentiating between reserved bit condition and the last one "if (found > count)". > >> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Oscar Salvador <osalvador@suse.de> >> Cc: Mel Gorman <mgorman@techsingularity.net> >> Cc: Mike Rapoport <rppt@linux.ibm.com> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> >> Cc: linux-kernel@vger.kernel.org >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> >> --- >> mm/page_alloc.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 15c2050c629b..fbf93ea119d2 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, >> >> page = pfn_to_page(check); >> >> - if (PageReserved(page)) >> + if (PageReserved(page)) { >> + reason = "Reserved page"; >> goto unmovable; >> + } >> >> /* >> * If the zone is movable and we have ruled out all reserved >> -- >> 2.20.1 >> >
On 10/03/2019 03:02 PM, Anshuman Khandual wrote: > > On 10/03/2019 02:35 PM, Qian Cai wrote: >> >>> On Oct 3, 2019, at 4:10 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >>> >>> Having unmovable pages on a given pageblock should be reported correctly >>> when required with REPORT_FAILURE flag. But there can be a scenario where a >>> reserved page in the page block will get reported as a generic "unmovable" >>> reason code. Instead this should be changed to a more appropriate reason >>> code like "Reserved page". >> Isn’t this redundant as it dumps the flags in dump_page() anyway? > Even though page flags does contain reserved bit information, the problem > is that we are explicitly printing the reason for this page dump. In this > case it is caused by the fact that it is a reserved page. > > page dumped because: <reason> > > The proposed change makes it explicit that the dump is caused because a > non movable page with reserved bit set. It also helps in differentiating > between reserved bit condition and the last one "if (found > count)" Instead, will it better to rename the reason codes as 1. "Unmovable (CMA)" 2. "Unmovable (Reserved)" 3. "Unmovable (Private/non-LRU)"
> On Oct 3, 2019, at 5:32 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: > > Even though page flags does contain reserved bit information, the problem > is that we are explicitly printing the reason for this page dump. In this > case it is caused by the fact that it is a reserved page. > > page dumped because: <reason> > > The proposed change makes it explicit that the dump is caused because a > non movable page with reserved bit set. It also helps in differentiating > between reserved bit condition and the last one "if (found > count)". Then, someone later would like to add a reason for every different page flags just because they *think* they are special.
On 10/03/2019 04:49 PM, Qian Cai wrote: > > >> On Oct 3, 2019, at 5:32 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >> >> Even though page flags does contain reserved bit information, the problem >> is that we are explicitly printing the reason for this page dump. In this >> case it is caused by the fact that it is a reserved page. >> >> page dumped because: <reason> >> >> The proposed change makes it explicit that the dump is caused because a >> non movable page with reserved bit set. It also helps in differentiating >> between reserved bit condition and the last one "if (found > count)". > > Then, someone later would like to add a reason for every different page flags just because they *think* they are special. > Ohh, never meant that the 'Reserved' bit is anything special here but it is a reason to make a page unmovable unlike many other flags.
> On Oct 3, 2019, at 7:31 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: > > Ohh, never meant that the 'Reserved' bit is anything special here but it > is a reason to make a page unmovable unlike many other flags. But dump_page() is used everywhere, and it is better to reserve “reason” to indicate something more important rather than duplicating the page flags. Especially, it is trivial enough right now for developers look in the page flags dumping from has_unmovable_pages(), and figure out the exact branching in the code.
On 10/03/2019 05:20 PM, Qian Cai wrote: > > >> On Oct 3, 2019, at 7:31 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >> >> Ohh, never meant that the 'Reserved' bit is anything special here but it >> is a reason to make a page unmovable unlike many other flags. > > But dump_page() is used everywhere, and it is better to reserve “reason” to indicate something more important rather than duplicating the page flags. > > Especially, it is trivial enough right now for developers look in the page flags dumping from has_unmovable_pages(), and figure out the exact branching in the code. > Will something like this be better ? hugepage_migration_supported() has got uncertainty depending on platform and huge page size. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 15c2050c629b..8dbc86696515 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8175,7 +8175,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, unsigned long found; unsigned long iter = 0; unsigned long pfn = page_to_pfn(page); - const char *reason = "unmovable page"; + const char *reason; /* * TODO we could make this much more efficient by not checking every @@ -8194,7 +8194,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, if (is_migrate_cma(migratetype)) return false; - reason = "CMA page"; + reason = "Unmovable CMA page"; goto unmovable; } @@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); - if (PageReserved(page)) + if (PageReserved(page)) { + reason = "Unmovable reserved page"; goto unmovable; + } /* * If the zone is movable and we have ruled out all reserved @@ -8226,8 +8228,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, struct page *head = compound_head(page); unsigned int skip_pages; - if (!hugepage_migration_supported(page_hstate(head))) + if (!hugepage_migration_supported(page_hstate(head))) { + reason = "Unmovable HugeTLB page"; goto unmovable; + } skip_pages = compound_nr(head) - (page - head); iter += skip_pages - 1; @@ -8271,8 +8275,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, * is set to both of a memory hole page and a _used_ kernel * page at boot. */ - if (found > count) + if (found > count) { + reason = "Unmovable non-LRU page"; goto unmovable; + } } return false; unmovable:
> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: > > Will something like this be better ? Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path. > hugepage_migration_supported() has got > uncertainty depending on platform and huge page size. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 15c2050c629b..8dbc86696515 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8175,7 +8175,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > unsigned long found; > unsigned long iter = 0; > unsigned long pfn = page_to_pfn(page); > - const char *reason = "unmovable page"; > + const char *reason; > > /* > * TODO we could make this much more efficient by not checking every > @@ -8194,7 +8194,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > if (is_migrate_cma(migratetype)) > return false; > > - reason = "CMA page"; > + reason = "Unmovable CMA page"; > goto unmovable; > } > > @@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > > page = pfn_to_page(check); > > - if (PageReserved(page)) > + if (PageReserved(page)) { > + reason = "Unmovable reserved page"; > goto unmovable; > + } > > /* > * If the zone is movable and we have ruled out all reserved > @@ -8226,8 +8228,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > struct page *head = compound_head(page); > unsigned int skip_pages; > > - if (!hugepage_migration_supported(page_hstate(head))) > + if (!hugepage_migration_supported(page_hstate(head))) { > + reason = "Unmovable HugeTLB page"; > goto unmovable; > + } > > skip_pages = compound_nr(head) - (page - head); > iter += skip_pages - 1; > @@ -8271,8 +8275,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > * is set to both of a memory hole page and a _used_ kernel > * page at boot. > */ > - if (found > count) > + if (found > count) { > + reason = "Unmovable non-LRU page"; > goto unmovable; > + } > } > return false; > unmovable:
On 03.10.19 14:14, Qian Cai wrote: > > >> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >> >> Will something like this be better ? > > Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path. > I agree, I use the dump_page() output frequently to identify PG_reserved pages. No need to duplicate that.
On Thu 03-10-19 13:40:57, Anshuman Khandual wrote: > Having unmovable pages on a given pageblock should be reported correctly > when required with REPORT_FAILURE flag. But there can be a scenario where a > reserved page in the page block will get reported as a generic "unmovable" > reason code. Instead this should be changed to a more appropriate reason > code like "Reserved page". Others have already pointed out this is just redundant but I will have a more generic comment on the changelog. There is essentially no information why the current state is bad/unhelpful and why the chnage is needed. All you claim is that something is a certain way and then assert that it should be done differently. That is not how changelogs should look like.
On 10/04/2019 04:28 PM, Michal Hocko wrote: > On Thu 03-10-19 13:40:57, Anshuman Khandual wrote: >> Having unmovable pages on a given pageblock should be reported correctly >> when required with REPORT_FAILURE flag. But there can be a scenario where a >> reserved page in the page block will get reported as a generic "unmovable" >> reason code. Instead this should be changed to a more appropriate reason >> code like "Reserved page". > > Others have already pointed out this is just redundant but I will have a Sure. > more generic comment on the changelog. There is essentially no > information why the current state is bad/unhelpful and why the chnage is The current state is not necessarily bad or unhelpful. I just though that it could be improved upon. Some how calling out explicitly only the CMA page failure case just felt adhoc, where as there are other reasons like HugeTLB immovability which might depend on other factors apart from just page flags (though I did not propose that originally). > needed. All you claim is that something is a certain way and then assert > that it should be done differently. That is not how changelogs should > look like. > Okay, probably I should have explained more on why "unmovable" is less than adequate to capture the exact reason for specific failure cases and how "Reserved Page" instead would been better. But got the point, will improve.
On Fri, 2019-10-04 at 17:14 +0530, Anshuman Khandual wrote: > > On 10/04/2019 04:28 PM, Michal Hocko wrote: > > On Thu 03-10-19 13:40:57, Anshuman Khandual wrote: > > > Having unmovable pages on a given pageblock should be reported correctly > > > when required with REPORT_FAILURE flag. But there can be a scenario where a > > > reserved page in the page block will get reported as a generic "unmovable" > > > reason code. Instead this should be changed to a more appropriate reason > > > code like "Reserved page". > > > > Others have already pointed out this is just redundant but I will have a > > Sure. > > > more generic comment on the changelog. There is essentially no > > information why the current state is bad/unhelpful and why the chnage is > > The current state is not necessarily bad or unhelpful. I just though that it > could be improved upon. Some how calling out explicitly only the CMA page > failure case just felt adhoc, where as there are other reasons like HugeTLB > immovability which might depend on other factors apart from just page flags > (though I did not propose that originally). > > > needed. All you claim is that something is a certain way and then assert > > that it should be done differently. That is not how changelogs should > > look like. > > > > Okay, probably I should have explained more on why "unmovable" is less than > adequate to capture the exact reason for specific failure cases and how > "Reserved Page" instead would been better. But got the point, will improve. > It might be a good time to rethink if it is really a good idea to dump_page() at all inside has_unmovable_pages(). As it is right now, it is a a potential deadlock between console vs memory offline. More details are in this thread, https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/ 01: [ 672.875392] WARNING: possible circular locking dependency detected 01: [ 672.875394] 5.4.0-rc1-next-20191004+ #64 Not tainted 01: [ 672.875396] ------------------------------------------------------ 01: [ 672.875398] test.sh/1724 is trying to acquire lock: 01: [ 672.875400] 0000000052059ec0 (console_owner){-...}, at: console_unlock+0x 01: 328/0xa30 01: [ 672.875406] 01: [ 672.875408] but task is already holding lock: 01: [ 672.875409] 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso 01: late_page_range+0x216/0x538 01: [ 672.875415] 01: [ 672.875417] which lock already depends on the new lock. 01: [ 672.875418] 01: [ 672.875419] 01: [ 672.875421] the existing dependency chain (in reverse order) is: 01: [ 672.875423] 01: [ 672.875424] -> #2 (&(&zone->lock)->rlock){-.-.}: 01: [ 672.875430] lock_acquire+0x21a/0x468 01: [ 672.875431] _raw_spin_lock+0x54/0x68 01: [ 672.875433] get_page_from_freelist+0x8b6/0x2d28 01: [ 672.875435] __alloc_pages_nodemask+0x246/0x658 01: [ 672.875437] __get_free_pages+0x34/0x78 01: [ 672.875438] sclp_init+0x106/0x690 01: [ 672.875440] sclp_register+0x2e/0x248 01: [ 672.875442] sclp_rw_init+0x4a/0x70 01: [ 672.875443] sclp_console_init+0x4a/0x1b8 01: [ 672.875445] console_init+0x2c8/0x410 01: [ 672.875447] start_kernel+0x530/0x6a0 01: [ 672.875448] startup_continue+0x70/0xd0 01: [ 672.875449] 01: [ 672.875450] -> #1 (sclp_lock){-.-.}: 01: [ 672.875458] lock_acquire+0x21a/0x468 01: [ 672.875460] _raw_spin_lock_irqsave+0xcc/0xe8 01: [ 672.875462] sclp_add_request+0x34/0x308 01: [ 672.875464] sclp_conbuf_emit+0x100/0x138 01: [ 672.875465] sclp_console_write+0x96/0x3b8 01: [ 672.875467] console_unlock+0x6dc/0xa30 01: [ 672.875469] vprintk_emit+0x184/0x3c8 01: [ 672.875470] vprintk_default+0x44/0x50 01: [ 672.875472] printk+0xa8/0xc0 01: [ 672.875473] iommu_debugfs_setup+0xf2/0x108 01: [ 672.875475] iommu_init+0x6c/0x78 01: [ 672.875477] do_one_initcall+0x162/0x680 01: [ 672.875478] kernel_init_freeable+0x4e8/0x5a8 01: [ 672.875480] kernel_init+0x2a/0x188 01: [ 672.875484] ret_from_fork+0x30/0x34 01: [ 672.875486] kernel_thread_starter+0x0/0xc 01: [ 672.875487] 01: [ 672.875488] -> #0 (console_owner){-...}: 01: [ 672.875495] check_noncircular+0x338/0x3e0 01: [ 672.875496] __lock_acquire+0x1e66/0x2d88 01: [ 672.875498] lock_acquire+0x21a/0x468 01: [ 672.875499] console_unlock+0x3a6/0xa30 01: [ 672.875501] vprintk_emit+0x184/0x3c8 01: [ 672.875503] vprintk_default+0x44/0x50 01: [ 672.875504] printk+0xa8/0xc0 01: [ 672.875506] __dump_page+0x1dc/0x710 01: [ 672.875507] dump_page+0x2e/0x58 01: [ 672.875509] has_unmovable_pages+0x2e8/0x470 01: [ 672.875511] start_isolate_page_range+0x404/0x538 01: [ 672.875513] __offline_pages+0x22c/0x1338 01: [ 672.875514] memory_subsys_offline+0xa6/0xe8 01: [ 672.875516] device_offline+0xe6/0x118 01: [ 672.875517] state_store+0xf0/0x110 01: [ 672.875519] kernfs_fop_write+0x1bc/0x270 01: [ 672.875521] vfs_write+0xce/0x220 01: [ 672.875522] ksys_write+0xea/0x190 01: [ 672.875524] system_call+0xd8/0x2b4 01: [ 672.875525] 01: [ 672.875527] other info that might help us debug this: 01: [ 672.875528] 01: [ 672.875529] Chain exists of: 01: [ 672.875530] console_owner --> sclp_lock --> &(&zone->lock)->rlock 01: [ 672.875538] 01: [ 672.875540] Possible unsafe locking scenario: 01: [ 672.875541] 01: [ 672.875543] CPU0 CPU1 01: [ 672.875544] ---- ---- 01: [ 672.875545] lock(&(&zone->lock)->rlock); 01: [ 672.875549] lock(sclp_lock); 01: [ 672.875553] lock(&(&zone->lock)->rlock); 01: [ 672.875557] lock(console_owner); 01: [ 672.875560] 01: [ 672.875562] *** DEADLOCK *** 01: [ 672.875563] 01: [ 672.875564] 9 locks held by test.sh/1724: 01: [ 672.875565] #0: 000000000e925408 (sb_writers#4){.+.+}, at: vfs_write+0x2 01: 06/0x220 01: [ 672.875574] #1: 0000000050aa4280 (&of->mutex){+.+.}, at: kernfs_fop_writ 01: e+0x154/0x270 01: [ 672.875581] #2: 0000000062e5c628 (kn->count#198){.+.+}, at: kernfs_fop_w 01: rite+0x16a/0x270 01: [ 672.875590] #3: 00000000523236a0 (device_hotplug_lock){+.+.}, at: lock_d 01: evice_hotplug_sysfs+0x30/0x80 01: [ 672.875598] #4: 0000000062e70990 (&dev->mutex){....}, at: device_offline 01: +0x78/0x118 01: [ 672.875605] #5: 0000000051fd36b0 (cpu_hotplug_lock.rw_sem){++++}, at: __ 01: offline_pages+0xec/0x1338 01: [ 672.875613] #6: 00000000521ca470 (mem_hotplug_lock.rw_sem){++++}, at: pe 01: rcpu_down_write+0x38/0x210 01: [ 672.875620] #7: 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: star 01: t_isolate_page_range+0x216/0x538 01: [ 672.875628] #8: 000000005205a100 (console_lock){+.+.}, at: vprintk_emit+ 01: 0x178/0x3c8 01: [ 672.875635] 01: [ 672.875636] stack backtrace: 01: [ 672.875639] CPU: 1 PID: 1724 Comm: test.sh Not tainted 5.4.0-rc1-next-201 01: 91004+ #64 01: [ 672.875640] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) 01: [ 672.875642] Call Trace: 01: [ 672.875644] ([<00000000512ae218>] show_stack+0x110/0x1b0) 01: [ 672.875645] [<0000000051b6d506>] dump_stack+0x126/0x178 01: [ 672.875648] [<00000000513a4b08>] check_noncircular+0x338/0x3e0 01: [ 672.875650] [<00000000513aaaf6>] __lock_acquire+0x1e66/0x2d88 01: [ 672.875652] [<00000000513a7e12>] lock_acquire+0x21a/0x468 01: [ 672.875654] [<00000000513bb2fe>] console_unlock+0x3a6/0xa30 01: [ 672.875655] [<00000000513bde2c>] vprintk_emit+0x184/0x3c8 01: [ 672.875657] [<00000000513be0b4>] vprintk_default+0x44/0x50 01: [ 672.875659] [<00000000513beb60>] printk+0xa8/0xc0 01: [ 672.875661] [<000000005158c364>] __dump_page+0x1dc/0x710 01: [ 672.875663] [<000000005158c8c6>] dump_page+0x2e/0x58 01: [ 672.875665] [<00000000515d87c8>] has_unmovable_pages+0x2e8/0x470 01: [ 672.875667] [<000000005167072c>] start_isolate_page_range+0x404/0x538 01: [ 672.875669] [<0000000051b96de4>] __offline_pages+0x22c/0x1338 01: [ 672.875671] [<0000000051908586>] memory_subsys_offline+0xa6/0xe8 01: [ 672.875673] [<00000000518e561e>] device_offline+0xe6/0x118 01: [ 672.875675] [<0000000051908170>] state_store+0xf0/0x110 01: [ 672.875677] [<0000000051796384>] kernfs_fop_write+0x1bc/0x270 01: [ 672.875679] [<000000005168972e>] vfs_write+0xce/0x220 01: [ 672.875681] [<0000000051689b9a>] ksys_write+0xea/0x190 01: [ 672.875685] [<0000000051ba9990>] system_call+0xd8/0x2b4 01: [ 672.875687] INFO: lockdep is turned off.
On Fri 04-10-19 08:56:16, Qian Cai wrote: [...] > It might be a good time to rethink if it is really a good idea to dump_page() > at all inside has_unmovable_pages(). As it is right now, it is a a potential > deadlock between console vs memory offline. More details are in this thread, > > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/ Huh. That would imply we cannot do any printk from that path, no?
On Fri, 2019-10-04 at 15:07 +0200, Michal Hocko wrote: > On Fri 04-10-19 08:56:16, Qian Cai wrote: > [...] > > It might be a good time to rethink if it is really a good idea to dump_page() > > at all inside has_unmovable_pages(). As it is right now, it is a a potential > > deadlock between console vs memory offline. More details are in this thread, > > > > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/ > > Huh. That would imply we cannot do any printk from that path, no? Yes, or use something like printk_deferred() or it needs to rework of the current console locking which I have no clue yet.
On Fri 04-10-19 09:30:39, Qian Cai wrote: > On Fri, 2019-10-04 at 15:07 +0200, Michal Hocko wrote: > > On Fri 04-10-19 08:56:16, Qian Cai wrote: > > [...] > > > It might be a good time to rethink if it is really a good idea to dump_page() > > > at all inside has_unmovable_pages(). As it is right now, it is a a potential > > > deadlock between console vs memory offline. More details are in this thread, > > > > > > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/ > > > > Huh. That would imply we cannot do any printk from that path, no? > > Yes, or use something like printk_deferred() This is just insane. The hotplug code is in no way special wrt printk. It is never called from the printk code AFAIK and thus there is no real reason why this particular code should be any special. Not to mention it calls printk indirectly from a code that is shared with other code paths. > or it needs to rework of the current console locking which I have no > clue yet. Yes, if the lockdep is really referring to a real deadlock which I haven't really explored.
On Fri, 2019-10-04 at 15:38 +0200, Michal Hocko wrote: > On Fri 04-10-19 09:30:39, Qian Cai wrote: > > On Fri, 2019-10-04 at 15:07 +0200, Michal Hocko wrote: > > > On Fri 04-10-19 08:56:16, Qian Cai wrote: > > > [...] > > > > It might be a good time to rethink if it is really a good idea to dump_page() > > > > at all inside has_unmovable_pages(). As it is right now, it is a a potential > > > > deadlock between console vs memory offline. More details are in this thread, > > > > > > > > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/ > > > > > > Huh. That would imply we cannot do any printk from that path, no? > > > > Yes, or use something like printk_deferred() > > This is just insane. The hotplug code is in no way special wrt printk. > It is never called from the printk code AFAIK and thus there is no real > reason why this particular code should be any special. Not to mention > it calls printk indirectly from a code that is shared with other code > paths. Basically, printk() while holding the zone_lock will be problematic as console is doing the opposite as it always needs to allocate some memory. Then, it will always find some way to form this chain, console_lock -> * -> zone_lock. > > > or it needs to rework of the current console locking which I have no > > clue yet. > > Yes, if the lockdep is really referring to a real deadlock which I > haven't really explored. >
On Fri 04-10-19 09:56:00, Qian Cai wrote: > On Fri, 2019-10-04 at 15:38 +0200, Michal Hocko wrote: > > On Fri 04-10-19 09:30:39, Qian Cai wrote: > > > On Fri, 2019-10-04 at 15:07 +0200, Michal Hocko wrote: > > > > On Fri 04-10-19 08:56:16, Qian Cai wrote: > > > > [...] > > > > > It might be a good time to rethink if it is really a good idea to dump_page() > > > > > at all inside has_unmovable_pages(). As it is right now, it is a a potential > > > > > deadlock between console vs memory offline. More details are in this thread, > > > > > > > > > > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/ > > > > > > > > Huh. That would imply we cannot do any printk from that path, no? > > > > > > Yes, or use something like printk_deferred() > > > > This is just insane. The hotplug code is in no way special wrt printk. > > It is never called from the printk code AFAIK and thus there is no real > > reason why this particular code should be any special. Not to mention > > it calls printk indirectly from a code that is shared with other code > > paths. > > Basically, printk() while holding the zone_lock will be problematic as console > is doing the opposite as it always needs to allocate some memory. Then, it will > always find some way to form this chain, > > console_lock -> * -> zone_lock. So this is not as much a hotplug specific problem but zone->lock -> printk -> alloc chain that is a problem, right? Who is doing an allocation from this atomic context? I do not see any atomic allocation in kernel/printk/printk.c.
On Fri, 4 Oct 2019 16:41:50 +0200 Michal Hocko <mhocko@kernel.org> wrote: > > > This is just insane. The hotplug code is in no way special wrt printk. > > > It is never called from the printk code AFAIK and thus there is no real > > > reason why this particular code should be any special. Not to mention > > > it calls printk indirectly from a code that is shared with other code > > > paths. > > > > Basically, printk() while holding the zone_lock will be problematic as console > > is doing the opposite as it always needs to allocate some memory. Then, it will > > always find some way to form this chain, > > > > console_lock -> * -> zone_lock. > > So this is not as much a hotplug specific problem but zone->lock -> > printk -> alloc chain that is a problem, right? Who is doing an > allocation from this atomic context? I do not see any atomic allocation > in kernel/printk/printk.c. Apparently some console drivers can do memory allocation on the printk() path. This behavior is daft, IMO. Have we identified which ones and looked into fixing them?
> On Oct 5, 2019, at 5:22 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > Apparently some console drivers can do memory allocation on the printk() > path. > > This behavior is daft, IMO. Have we identified which ones and looked > into fixing them? Not necessary that simple. It is more of 2+ CPUs required to trigger the deadlock. Please see my v2 patch I sent which has all the information. Especially, the thread link included there which contains a few lockdep splat traces and the s390 one in the patch description.
On 10/04/2019 01:55 PM, David Hildenbrand wrote: > On 03.10.19 14:14, Qian Cai wrote: >> >> >>> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >>> >>> Will something like this be better ? >> >> Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path. >> > > I agree, I use the dump_page() output frequently to identify PG_reserved > pages. No need to duplicate that. Here in this path there is a reserved page which is preventing offlining a memory section but unfortunately dump_page() does not print page->flags for a reserved page pinned there possibly through memblock_reserve() during boot. __offline_pages() start_isolate_page_range() set_migratetype_isolate() has_unmovable_pages() dump_page() [ 64.920970] ------------[ cut here ]------------ [ 64.921718] WARNING: CPU: 16 PID: 1116 at mm/page_alloc.c:8298 has_unmovable_pages+0x274/0x2a8 [ 64.923110] Modules linked in: [ 64.923634] CPU: 16 PID: 1116 Comm: bash Not tainted 5.5.0-rc6-00006-gca544f2a11ae-dirty #281 [ 64.925102] Hardware name: linux,dummy-virt (DT) [ 64.925905] pstate: 60400085 (nZCv daIf +PAN -UAO) [ 64.926742] pc : has_unmovable_pages+0x274/0x2a8 [ 64.927554] lr : has_unmovable_pages+0x298/0x2a8 [ 64.928359] sp : ffff800014fd3a00 [ 64.928944] x29: ffff800014fd3a00 x28: fffffe0017640000 [ 64.929875] x27: 0000000000000000 x26: ffff0005fcfcda00 [ 64.930810] x25: 0000000000640000 x24: 0000000000000003 [ 64.931736] x23: 0000000019840000 x22: 0000000000001380 [ 64.932667] x21: ffff800011259000 x20: ffff0005fcfcda00 [ 64.933588] x19: 0000000000661000 x18: 0000000000000010 [ 64.934514] x17: 0000000000000000 x16: 0000000000000000 [ 64.935454] x15: ffffffffffffffff x14: ffff8000118498c8 [ 64.936377] x13: ffff800094fd3797 x12: ffff800014fd379f [ 64.937304] x11: ffff800011861000 x10: ffff800014fd3720 [ 64.938226] x9 : 00000000ffffffd0 x8 : ffff8000106a60d0 [ 64.939156] x7 : 0000000000000000 x6 : ffff0005fc6261b0 [ 64.940078] x5 : ffff0005fc6261b0 x4 : 0000000000000000 [ 64.941003] x3 : ffff0005fc62cf80 x2 : ffffffffffffec80 [ 64.941927] x1 : ffff800011141b58 x0 : ffff0005fcfcda00 [ 64.942857] Call trace: [ 64.943298] has_unmovable_pages+0x274/0x2a8 [ 64.944056] start_isolate_page_range+0x258/0x360 [ 64.944879] __offline_pages+0xf4/0x9e8 [ 64.945554] offline_pages+0x10/0x18 [ 64.946189] memory_block_action+0x40/0x1a0 [ 64.946929] memory_subsys_offline+0x4c/0x78 [ 64.947679] device_offline+0x98/0xc8 [ 64.948328] unprobe_store+0xa8/0x158 [ 64.948976] dev_attr_store+0x14/0x28 [ 64.949628] sysfs_kf_write+0x40/0x50 [ 64.950273] kernfs_fop_write+0x108/0x218 [ 64.950983] __vfs_write+0x18/0x40 [ 64.951592] vfs_write+0xb0/0x1d0 [ 64.952175] ksys_write+0x64/0xe8 [ 64.952761] __arm64_sys_write+0x18/0x20 [ 64.953451] el0_svc_common.constprop.2+0x88/0x150 [ 64.954293] el0_svc_handler+0x20/0x80 [ 64.954963] el0_sync_handler+0x118/0x188 [ 64.955669] el0_sync+0x140/0x180 [ 64.956256] ---[ end trace b162b4d1cbea304d ]--- [ 64.957063] page:fffffe0017640000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 [ 64.958489] raw: 1ffff80000001000 fffffe0017640008 fffffe0017640008 0000000000000000 [ 64.959839] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 64.961174] page dumped because: unmovable page The reason is dump_page() does not print page->flags universally and only does so for KSM, Anon and File pages while excluding reserved pages at boot. Wondering should not we make printing page->flags universal ? - Anshuman
On 14.01.20 09:19, Anshuman Khandual wrote: > > > On 10/04/2019 01:55 PM, David Hildenbrand wrote: >> On 03.10.19 14:14, Qian Cai wrote: >>> >>> >>>> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: >>>> >>>> Will something like this be better ? >>> >>> Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path. >>> >> >> I agree, I use the dump_page() output frequently to identify PG_reserved >> pages. No need to duplicate that. > > Here in this path there is a reserved page which is preventing > offlining a memory section but unfortunately dump_page() does > not print page->flags for a reserved page pinned there possibly > through memblock_reserve() during boot. > > __offline_pages() > start_isolate_page_range() > set_migratetype_isolate() > has_unmovable_pages() > dump_page() > > [ 64.920970] ------------[ cut here ]------------ > [ 64.921718] WARNING: CPU: 16 PID: 1116 at mm/page_alloc.c:8298 has_unmovable_pages+0x274/0x2a8 > [ 64.923110] Modules linked in: > [ 64.923634] CPU: 16 PID: 1116 Comm: bash Not tainted 5.5.0-rc6-00006-gca544f2a11ae-dirty #281 > [ 64.925102] Hardware name: linux,dummy-virt (DT) > [ 64.925905] pstate: 60400085 (nZCv daIf +PAN -UAO) > [ 64.926742] pc : has_unmovable_pages+0x274/0x2a8 > [ 64.927554] lr : has_unmovable_pages+0x298/0x2a8 > [ 64.928359] sp : ffff800014fd3a00 > [ 64.928944] x29: ffff800014fd3a00 x28: fffffe0017640000 > [ 64.929875] x27: 0000000000000000 x26: ffff0005fcfcda00 > [ 64.930810] x25: 0000000000640000 x24: 0000000000000003 > [ 64.931736] x23: 0000000019840000 x22: 0000000000001380 > [ 64.932667] x21: ffff800011259000 x20: ffff0005fcfcda00 > [ 64.933588] x19: 0000000000661000 x18: 0000000000000010 > [ 64.934514] x17: 0000000000000000 x16: 0000000000000000 > [ 64.935454] x15: ffffffffffffffff x14: ffff8000118498c8 > [ 64.936377] x13: ffff800094fd3797 x12: ffff800014fd379f > [ 64.937304] x11: ffff800011861000 x10: ffff800014fd3720 > [ 64.938226] x9 : 00000000ffffffd0 x8 : ffff8000106a60d0 > [ 64.939156] x7 : 0000000000000000 x6 : ffff0005fc6261b0 > [ 64.940078] x5 : ffff0005fc6261b0 x4 : 0000000000000000 > [ 64.941003] x3 : ffff0005fc62cf80 x2 : ffffffffffffec80 > [ 64.941927] x1 : ffff800011141b58 x0 : ffff0005fcfcda00 > [ 64.942857] Call trace: > [ 64.943298] has_unmovable_pages+0x274/0x2a8 > [ 64.944056] start_isolate_page_range+0x258/0x360 > [ 64.944879] __offline_pages+0xf4/0x9e8 > [ 64.945554] offline_pages+0x10/0x18 > [ 64.946189] memory_block_action+0x40/0x1a0 > [ 64.946929] memory_subsys_offline+0x4c/0x78 > [ 64.947679] device_offline+0x98/0xc8 > [ 64.948328] unprobe_store+0xa8/0x158 > [ 64.948976] dev_attr_store+0x14/0x28 > [ 64.949628] sysfs_kf_write+0x40/0x50 > [ 64.950273] kernfs_fop_write+0x108/0x218 > [ 64.950983] __vfs_write+0x18/0x40 > [ 64.951592] vfs_write+0xb0/0x1d0 > [ 64.952175] ksys_write+0x64/0xe8 > [ 64.952761] __arm64_sys_write+0x18/0x20 > [ 64.953451] el0_svc_common.constprop.2+0x88/0x150 > [ 64.954293] el0_svc_handler+0x20/0x80 > [ 64.954963] el0_sync_handler+0x118/0x188 > [ 64.955669] el0_sync+0x140/0x180 > [ 64.956256] ---[ end trace b162b4d1cbea304d ]--- > [ 64.957063] page:fffffe0017640000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 > [ 64.958489] raw: 1ffff80000001000 fffffe0017640008 fffffe0017640008 0000000000000000 > [ 64.959839] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > [ 64.961174] page dumped because: unmovable page > > The reason is dump_page() does not print page->flags universally > and only does so for KSM, Anon and File pages while excluding > reserved pages at boot. Wondering should not we make printing > page->flags universal ? The thing is that "PageReserved" on a random page tells us that the values in the memmap cannot be trusted (in some scenarios). However, we also expose flags for reserved pages via stable_page_flags() - /proc/kpageflags. As this is just a debugging mechanism, I think it makes sense to also print them.
[Cc Ralph] On Tue 14-01-20 13:49:14, Anshuman Khandual wrote: > > > On 10/04/2019 01:55 PM, David Hildenbrand wrote: > > On 03.10.19 14:14, Qian Cai wrote: > >> > >> > >>> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote: > >>> > >>> Will something like this be better ? > >> > >> Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path. > >> > > > > I agree, I use the dump_page() output frequently to identify PG_reserved > > pages. No need to duplicate that. > > Here in this path there is a reserved page which is preventing > offlining a memory section but unfortunately dump_page() does > not print page->flags for a reserved page pinned there possibly > through memblock_reserve() during boot. > > __offline_pages() > start_isolate_page_range() > set_migratetype_isolate() > has_unmovable_pages() > dump_page() > > [ 64.920970] ------------[ cut here ]------------ > [ 64.921718] WARNING: CPU: 16 PID: 1116 at mm/page_alloc.c:8298 has_unmovable_pages+0x274/0x2a8 > [ 64.923110] Modules linked in: > [ 64.923634] CPU: 16 PID: 1116 Comm: bash Not tainted 5.5.0-rc6-00006-gca544f2a11ae-dirty #281 > [ 64.925102] Hardware name: linux,dummy-virt (DT) > [ 64.925905] pstate: 60400085 (nZCv daIf +PAN -UAO) > [ 64.926742] pc : has_unmovable_pages+0x274/0x2a8 > [ 64.927554] lr : has_unmovable_pages+0x298/0x2a8 > [ 64.928359] sp : ffff800014fd3a00 > [ 64.928944] x29: ffff800014fd3a00 x28: fffffe0017640000 > [ 64.929875] x27: 0000000000000000 x26: ffff0005fcfcda00 > [ 64.930810] x25: 0000000000640000 x24: 0000000000000003 > [ 64.931736] x23: 0000000019840000 x22: 0000000000001380 > [ 64.932667] x21: ffff800011259000 x20: ffff0005fcfcda00 > [ 64.933588] x19: 0000000000661000 x18: 0000000000000010 > [ 64.934514] x17: 0000000000000000 x16: 0000000000000000 > [ 64.935454] x15: ffffffffffffffff x14: ffff8000118498c8 > [ 64.936377] x13: ffff800094fd3797 x12: ffff800014fd379f > [ 64.937304] x11: ffff800011861000 x10: ffff800014fd3720 > [ 64.938226] x9 : 00000000ffffffd0 x8 : ffff8000106a60d0 > [ 64.939156] x7 : 0000000000000000 x6 : ffff0005fc6261b0 > [ 64.940078] x5 : ffff0005fc6261b0 x4 : 0000000000000000 > [ 64.941003] x3 : ffff0005fc62cf80 x2 : ffffffffffffec80 > [ 64.941927] x1 : ffff800011141b58 x0 : ffff0005fcfcda00 > [ 64.942857] Call trace: > [ 64.943298] has_unmovable_pages+0x274/0x2a8 > [ 64.944056] start_isolate_page_range+0x258/0x360 > [ 64.944879] __offline_pages+0xf4/0x9e8 > [ 64.945554] offline_pages+0x10/0x18 > [ 64.946189] memory_block_action+0x40/0x1a0 > [ 64.946929] memory_subsys_offline+0x4c/0x78 > [ 64.947679] device_offline+0x98/0xc8 > [ 64.948328] unprobe_store+0xa8/0x158 > [ 64.948976] dev_attr_store+0x14/0x28 > [ 64.949628] sysfs_kf_write+0x40/0x50 > [ 64.950273] kernfs_fop_write+0x108/0x218 > [ 64.950983] __vfs_write+0x18/0x40 > [ 64.951592] vfs_write+0xb0/0x1d0 > [ 64.952175] ksys_write+0x64/0xe8 > [ 64.952761] __arm64_sys_write+0x18/0x20 > [ 64.953451] el0_svc_common.constprop.2+0x88/0x150 > [ 64.954293] el0_svc_handler+0x20/0x80 > [ 64.954963] el0_sync_handler+0x118/0x188 > [ 64.955669] el0_sync+0x140/0x180 > [ 64.956256] ---[ end trace b162b4d1cbea304d ]--- > [ 64.957063] page:fffffe0017640000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 > [ 64.958489] raw: 1ffff80000001000 fffffe0017640008 fffffe0017640008 0000000000000000 > [ 64.959839] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > [ 64.961174] page dumped because: unmovable page > > The reason is dump_page() does not print page->flags universally > and only does so for KSM, Anon and File pages while excluding > reserved pages at boot. Wondering should not we make printing > page->flags universal ? We used to do that and this caught me as a surprise when looking again. This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line") which is a cleanup patch and I suspect this result was not anticipated. The following will do the trick but I cannot really say I like the code duplication. pr_cont in this case sounds like a much cleaner solution to me. diff --git a/mm/debug.c b/mm/debug.c index 0461df1207cb..4cbf30d03c88 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -89,6 +89,8 @@ void __dump_page(struct page *page, const char *reason) } else pr_warn("%ps\n", mapping->a_ops); pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags); + } else { + pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags); } BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
On 1/14/20 10:10 AM, Michal Hocko wrote: > [Cc Ralph] >> The reason is dump_page() does not print page->flags universally >> and only does so for KSM, Anon and File pages while excluding >> reserved pages at boot. Wondering should not we make printing >> page->flags universal ? > > We used to do that and this caught me as a surprise when looking again. > This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an > extra line") which is a cleanup patch and I suspect this result was not > anticipated. > > The following will do the trick but I cannot really say I like the code > duplication. pr_cont in this case sounds like a much cleaner solution to > me. How about this then? diff --git mm/debug.c mm/debug.c index 0461df1207cb..6a52316af839 100644 --- mm/debug.c +++ mm/debug.c @@ -47,6 +47,7 @@ void __dump_page(struct page *page, const char *reason) struct address_space *mapping; bool page_poisoned = PagePoisoned(page); int mapcount; + char *type = ""; /* * If struct page is poisoned don't access Page*() functions as that @@ -78,9 +79,9 @@ void __dump_page(struct page *page, const char *reason) page, page_ref_count(page), mapcount, page->mapping, page_to_pgoff(page)); if (PageKsm(page)) - pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags); + type = "ksm "; else if (PageAnon(page)) - pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags); + type = "anon "; else if (mapping) { if (mapping->host && mapping->host->i_dentry.first) { struct dentry *dentry; @@ -88,10 +89,11 @@ void __dump_page(struct page *page, const char *reason) pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry); } else pr_warn("%ps\n", mapping->a_ops); - pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags); } BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); + pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags); + hex_only: print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, sizeof(unsigned long), page,
On 01/14/2020 03:53 PM, Vlastimil Babka wrote: > On 1/14/20 10:10 AM, Michal Hocko wrote: >> [Cc Ralph] >>> The reason is dump_page() does not print page->flags universally >>> and only does so for KSM, Anon and File pages while excluding >>> reserved pages at boot. Wondering should not we make printing >>> page->flags universal ? >> >> We used to do that and this caught me as a surprise when looking again. >> This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an >> extra line") which is a cleanup patch and I suspect this result was not >> anticipated. >> >> The following will do the trick but I cannot really say I like the code >> duplication. pr_cont in this case sounds like a much cleaner solution to >> me. > > How about this then? This looks better than what we have right now though my initial thought was similar to what Michal had suggested earlier. > > diff --git mm/debug.c mm/debug.c > index 0461df1207cb..6a52316af839 100644 > --- mm/debug.c > +++ mm/debug.c > @@ -47,6 +47,7 @@ void __dump_page(struct page *page, const char *reason) > struct address_space *mapping; > bool page_poisoned = PagePoisoned(page); > int mapcount; > + char *type = ""; > > /* > * If struct page is poisoned don't access Page*() functions as that > @@ -78,9 +79,9 @@ void __dump_page(struct page *page, const char *reason) > page, page_ref_count(page), mapcount, > page->mapping, page_to_pgoff(page)); > if (PageKsm(page)) > - pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags); > + type = "ksm "; > else if (PageAnon(page)) > - pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags); > + type = "anon "; > else if (mapping) { > if (mapping->host && mapping->host->i_dentry.first) { > struct dentry *dentry; > @@ -88,10 +89,11 @@ void __dump_page(struct page *page, const char *reason) > pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry); > } else > pr_warn("%ps\n", mapping->a_ops); > - pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags); > } > BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > > + pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags); > + > hex_only: > print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, > sizeof(unsigned long), page, > >
On Tue 14-01-20 11:23:52, Vlastimil Babka wrote: > On 1/14/20 10:10 AM, Michal Hocko wrote: > > [Cc Ralph] > >> The reason is dump_page() does not print page->flags universally > >> and only does so for KSM, Anon and File pages while excluding > >> reserved pages at boot. Wondering should not we make printing > >> page->flags universal ? > > > > We used to do that and this caught me as a surprise when looking again. > > This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an > > extra line") which is a cleanup patch and I suspect this result was not > > anticipated. > > > > The following will do the trick but I cannot really say I like the code > > duplication. pr_cont in this case sounds like a much cleaner solution to > > me. > > How about this then? Yes makes sense as well. > diff --git mm/debug.c mm/debug.c > index 0461df1207cb..6a52316af839 100644 > --- mm/debug.c > +++ mm/debug.c > @@ -47,6 +47,7 @@ void __dump_page(struct page *page, const char *reason) > struct address_space *mapping; > bool page_poisoned = PagePoisoned(page); > int mapcount; > + char *type = ""; > > /* > * If struct page is poisoned don't access Page*() functions as that > @@ -78,9 +79,9 @@ void __dump_page(struct page *page, const char *reason) > page, page_ref_count(page), mapcount, > page->mapping, page_to_pgoff(page)); > if (PageKsm(page)) > - pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags); > + type = "ksm "; > else if (PageAnon(page)) > - pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags); > + type = "anon "; > else if (mapping) { > if (mapping->host && mapping->host->i_dentry.first) { > struct dentry *dentry; > @@ -88,10 +89,11 @@ void __dump_page(struct page *page, const char *reason) > pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry); > } else > pr_warn("%ps\n", mapping->a_ops); > - pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags); > } > BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > > + pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags); > + > hex_only: > print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, > sizeof(unsigned long), page, >
On 1/14/20 3:32 AM, Michal Hocko wrote: > On Tue 14-01-20 11:23:52, Vlastimil Babka wrote: >> On 1/14/20 10:10 AM, Michal Hocko wrote: >>> [Cc Ralph] >>>> The reason is dump_page() does not print page->flags universally >>>> and only does so for KSM, Anon and File pages while excluding >>>> reserved pages at boot. Wondering should not we make printing >>>> page->flags universal ? >>> >>> We used to do that and this caught me as a surprise when looking again. >>> This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an >>> extra line") which is a cleanup patch and I suspect this result was not >>> anticipated. >>> >>> The following will do the trick but I cannot really say I like the code >>> duplication. pr_cont in this case sounds like a much cleaner solution to >>> me. >> >> How about this then? > > Yes makes sense as well. > >> diff --git mm/debug.c mm/debug.c >> index 0461df1207cb..6a52316af839 100644 >> --- mm/debug.c >> +++ mm/debug.c >> @@ -47,6 +47,7 @@ void __dump_page(struct page *page, const char *reason) >> struct address_space *mapping; >> bool page_poisoned = PagePoisoned(page); >> int mapcount; >> + char *type = ""; >> >> /* >> * If struct page is poisoned don't access Page*() functions as that >> @@ -78,9 +79,9 @@ void __dump_page(struct page *page, const char *reason) >> page, page_ref_count(page), mapcount, >> page->mapping, page_to_pgoff(page)); >> if (PageKsm(page)) >> - pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags); >> + type = "ksm "; >> else if (PageAnon(page)) >> - pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags); >> + type = "anon "; >> else if (mapping) { >> if (mapping->host && mapping->host->i_dentry.first) { >> struct dentry *dentry; >> @@ -88,10 +89,11 @@ void __dump_page(struct page *page, const char *reason) >> pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry); >> } else >> pr_warn("%ps\n", mapping->a_ops); >> - pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags); >> } >> BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); >> >> + pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags); >> + >> hex_only: >> print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, >> sizeof(unsigned long), page, >> > Looks good to me. Thanks for the clean up. Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 15c2050c629b..fbf93ea119d2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); - if (PageReserved(page)) + if (PageReserved(page)) { + reason = "Reserved page"; goto unmovable; + } /* * If the zone is movable and we have ruled out all reserved
Having unmovable pages on a given pageblock should be reported correctly when required with REPORT_FAILURE flag. But there can be a scenario where a reserved page in the page block will get reported as a generic "unmovable" reason code. Instead this should be changed to a more appropriate reason code like "Reserved page". Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)