Message ID | 20180523081609.GG20441@dhcp22.suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote: > On Wed 23-05-18 09:52:39, Michal Hocko wrote: > [...] > > Yeah, the current code is far from optimal. We > > used to have a retry count but that one was removed exactly because of > > premature failures. There are three things here > > 1) zone_movable should contain any bootmem or otherwise non-migrateable > > pages > > 2) start_isolate_page_range should fail when seeing such pages - maybe > > has_unmovable_pages is overly optimistic and it should check all > > pages even in movable zones. > > 3) migrate_pages should really tell us whether the failure is temporal > > or permanent. I am not sure we can do that easily though. > > 2) should be the most simple one for now. Could you give it a try? Btw. > the exact configuration that led to boothmem pages in zone_movable would > be really appreciated: I will try it out and I will paste the config. > --- > From 6aa144a9b1c01255c89a4592221d706ccc4b4eea Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Wed, 23 May 2018 10:04:20 +0200 > Subject: [PATCH] mm, memory_hotplug: make has_unmovable_pages more robust > > Oscar has reported: > : Due to an unfortunate setting with movablecore, memblocks containing bootmem > : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable. > : So while trying to remove that memory, the system failed in do_migrate_range > : and __offline_pages never returned. > > This is because we rely on start_isolate_page_range resp. has_unmovable_pages > to do their jobb. The first one isolates the whole range to be offlined > so that we do not allocate from it anymore and the later makes sure we > are not stumbling over non-migrateable pages. > > has_unmovable_pages is overly optimistic, however. It doesn't check all > the pages if we are withing zone_movable because we rely that those > pages will be always migrateable. As it turns out we are still not > perfect there. While bootmem pages in zonemovable sound like a clear bug > which should be fixed let's remove the optimization for now and warn if > we encounter unmovable pages in zone_movable in the meantime. That > should help for now at least. > > Btw. this wasn't a real problem until 72b39cfc4d75 ("mm, memory_hotplug: > do not fail offlining too early") because we used to have a small number > of retries and then failed. This turned out to be too fragile though. > > Reported-by: Oscar Salvador <osalvador@techadventures.net> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/page_alloc.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c6f4008ea55..b9a45753244d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7629,11 +7629,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > unsigned long pfn, iter, found; > > /* > - * For avoiding noise data, lru_add_drain_all() should be called > - * If ZONE_MOVABLE, the zone never contains unmovable pages > + * TODO we could make this much more efficient by not checking every > + * page in the range if we know all of them are in MOVABLE_ZONE and > + * that the movable zone guarantees that pages are migratable but > + * the later is not the case right now unfortunatelly. E.g. movablecore > + * can still lead to having bootmem allocations in zone_movable. > */ > - if (zone_idx(zone) == ZONE_MOVABLE) > - return false; > > /* > * CMA allocations (alloc_contig_range) really need to mark isolate > @@ -7654,7 +7655,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > page = pfn_to_page(check); > > if (PageReserved(page)) > - return true; > + goto unmovable; > > /* > * Hugepages are not in LRU lists, but they're movable. > @@ -7704,9 +7705,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > * page at boot. > */ > if (found > count) > - return true; > + goto unmovable; > } > return false; > +unmovable: > + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); > + return true; > } > > #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA) > -- > 2.17.0 Thanks Oscar Salvador
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote: > On Wed 23-05-18 09:52:39, Michal Hocko wrote: > [...] > > Yeah, the current code is far from optimal. We > > used to have a retry count but that one was removed exactly because of > > premature failures. There are three things here > > 1) zone_movable should contain any bootmem or otherwise non-migrateable > > pages > > 2) start_isolate_page_range should fail when seeing such pages - maybe > > has_unmovable_pages is overly optimistic and it should check all > > pages even in movable zones. > > 3) migrate_pages should really tell us whether the failure is temporal > > or permanent. I am not sure we can do that easily though. > > 2) should be the most simple one for now. Could you give it a try? Btw. > the exact configuration that led to boothmem pages in zone_movable would > be really appreciated: > --- > From 6aa144a9b1c01255c89a4592221d706ccc4b4eea Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Wed, 23 May 2018 10:04:20 +0200 > Subject: [PATCH] mm, memory_hotplug: make has_unmovable_pages more robust > > Oscar has reported: > : Due to an unfortunate setting with movablecore, memblocks containing bootmem > : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable. > : So while trying to remove that memory, the system failed in do_migrate_range > : and __offline_pages never returned. > > This is because we rely on start_isolate_page_range resp. has_unmovable_pages > to do their jobb. The first one isolates the whole range to be offlined > so that we do not allocate from it anymore and the later makes sure we > are not stumbling over non-migrateable pages. > > has_unmovable_pages is overly optimistic, however. It doesn't check all > the pages if we are withing zone_movable because we rely that those > pages will be always migrateable. As it turns out we are still not > perfect there. While bootmem pages in zonemovable sound like a clear bug > which should be fixed let's remove the optimization for now and warn if > we encounter unmovable pages in zone_movable in the meantime. That > should help for now at least. > > Btw. this wasn't a real problem until 72b39cfc4d75 ("mm, memory_hotplug: > do not fail offlining too early") because we used to have a small number > of retries and then failed. This turned out to be too fragile though. > > Reported-by: Oscar Salvador <osalvador@techadventures.net> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/page_alloc.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c6f4008ea55..b9a45753244d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7629,11 +7629,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > unsigned long pfn, iter, found; > > /* > - * For avoiding noise data, lru_add_drain_all() should be called > - * If ZONE_MOVABLE, the zone never contains unmovable pages > + * TODO we could make this much more efficient by not checking every > + * page in the range if we know all of them are in MOVABLE_ZONE and > + * that the movable zone guarantees that pages are migratable but > + * the later is not the case right now unfortunatelly. E.g. movablecore > + * can still lead to having bootmem allocations in zone_movable. > */ > - if (zone_idx(zone) == ZONE_MOVABLE) > - return false; > > /* > * CMA allocations (alloc_contig_range) really need to mark isolate > @@ -7654,7 +7655,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > page = pfn_to_page(check); > > if (PageReserved(page)) > - return true; > + goto unmovable; > > /* > * Hugepages are not in LRU lists, but they're movable. > @@ -7704,9 +7705,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > * page at boot. > */ > if (found > count) > - return true; > + goto unmovable; > } > return false; > +unmovable: > + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); > + return true; > } > > #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA) > -- > 2.17.0 Tested-by: Oscar Salvador <osalvador@techadventures.net> thanks! Oscar Salvador
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote: > On Wed 23-05-18 09:52:39, Michal Hocko wrote: > [...] > > Yeah, the current code is far from optimal. We > > used to have a retry count but that one was removed exactly because of > > premature failures. There are three things here > > 1) zone_movable should contain any bootmem or otherwise non-migrateable > > pages > > 2) start_isolate_page_range should fail when seeing such pages - maybe > > has_unmovable_pages is overly optimistic and it should check all > > pages even in movable zones. > > 3) migrate_pages should really tell us whether the failure is temporal > > or permanent. I am not sure we can do that easily though. > > 2) should be the most simple one for now. Could you give it a try? Btw. > the exact configuration that led to boothmem pages in zone_movable would > be really appreciated: Here is some information: ** Qemu cmdline: # qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ... # Option movablecore=4G (cmdline) ** e820 map and some numa information: linux kernel: BIOS-provided physical RAM map: linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable linux kernel: NX (Execute Disable) protection: active linux kernel: SMBIOS 2.8 present. linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org linux kernel: Hypervisor detected: KVM linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000 linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1 linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff] linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff] linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0 linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0 linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff] linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff] ** /proc/zoneinfo Node 0, zone DMA per-node stats nr_inactive_anon 2107 nr_active_anon 49560 nr_inactive_file 25375 nr_active_file 19038 nr_unevictable 12 nr_slab_reclaimable 5996 nr_slab_unreclaimable 7236 nr_isolated_anon 0 nr_isolated_file 0 workingset_refault 0 workingset_activate 0 workingset_nodereclaim 0 nr_anon_pages 48910 nr_mapped 13780 nr_file_pages 46676 nr_dirty 13 nr_writeback 0 nr_writeback_temp 0 nr_shmem 2263 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_anon_transparent_hugepages 50 nr_unstable 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_dirtied 17749 nr_written 17462 83328 pages free 3961 min 29 low 36 high 43 spanned 4095 present 3998 managed 3977 protection: (0, 2939, 2939, 3898, 3898) nr_free_pages 3961 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 0 nr_zone_active_file 0 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_mlock 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 2 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 1 numa_other 1 pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 4 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 4 node_unreclaimable: 0 start_pfn: 1 Node 0, zone DMA32 pages free 724414 min 5583 low 6978 high 8373 spanned 1044480 present 782304 managed 758516 protection: (0, 0, 0, 959, 959) nr_free_pages 724414 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 1697 nr_zone_active_file 8915 nr_zone_unevictable 0 nr_zone_write_pending 12 nr_mlock 0 nr_page_table_pages 2976 nr_kernel_stack 4000 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 281025 numa_miss 0 numa_foreign 0 numa_interleave 8583 numa_local 135392 numa_other 145633 pagesets cpu: 0 count: 164 high: 186 batch: 31 vm stats threshold: 24 cpu: 1 count: 32 high: 186 batch: 31 vm stats threshold: 24 node_unreclaimable: 0 start_pfn: 4096 Node 0, zone Normal pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 7677, 7677) Node 0, zone Movable pages free 160140 min 1823 low 2278 high 2733 spanned 262144 present 262144 managed 245670 protection: (0, 0, 0, 0, 0) nr_free_pages 160140 nr_zone_inactive_anon 2107 nr_zone_active_anon 49560 nr_zone_inactive_file 23678 nr_zone_active_file 10123 nr_zone_unevictable 12 nr_zone_write_pending 1 nr_mlock 12 nr_page_table_pages 0 nr_kernel_stack 0 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 214370 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 214344 numa_other 26 pagesets cpu: 0 count: 32 high: 42 batch: 7 vm stats threshold: 16 cpu: 1 count: 26 high: 42 batch: 7 vm stats threshold: 16 node_unreclaimable: 0 start_pfn: 1048576 Node 0, zone Device pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 0, 0) Node 1, zone DMA pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 2014, 2014) Node 1, zone DMA32 pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 2014, 2014) Node 1, zone Normal pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 16117, 16117) Node 1, zone Movable per-node stats nr_inactive_anon 524 nr_active_anon 25734 nr_inactive_file 28733 nr_active_file 12316 nr_unevictable 8 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_isolated_anon 0 nr_isolated_file 0 workingset_refault 0 workingset_activate 0 workingset_nodereclaim 0 nr_anon_pages 24656 nr_mapped 16871 nr_file_pages 41647 nr_dirty 1 nr_writeback 0 nr_writeback_temp 0 nr_shmem 598 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_anon_transparent_hugepages 8 nr_unstable 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_dirtied 125 nr_written 98 0 pages free 448427 min 3827 low 4783 high 5739 spanned 524288 present 524288 managed 515766 protection: (0, 0, 0, 0, 0) nr_free_pages 448427 nr_zone_inactive_anon 524 nr_zone_active_anon 25734 nr_zone_inactive_file 28733 nr_zone_active_file 12316 nr_zone_unevictable 8 nr_zone_write_pending 1 nr_mlock 8 nr_page_table_pages 0 nr_kernel_stack 0 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 199599 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 199599 numa_other 0 pagesets cpu: 0 count: 9 high: 42 batch: 7 vm stats threshold: 20 cpu: 1 count: 2 high: 42 batch: 7 vm stats threshold: 20 node_unreclaimable: 0 start_pfn: 1310720 Node 1, zone Device pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 0, 0) I hope this is enough. Thanks Oscar Salvador
On Wed 23-05-18 12:26:43, Oscar Salvador wrote: > On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote: > > On Wed 23-05-18 09:52:39, Michal Hocko wrote: > > [...] > > > Yeah, the current code is far from optimal. We > > > used to have a retry count but that one was removed exactly because of > > > premature failures. There are three things here > > > 1) zone_movable should contain any bootmem or otherwise non-migrateable > > > pages > > > 2) start_isolate_page_range should fail when seeing such pages - maybe > > > has_unmovable_pages is overly optimistic and it should check all > > > pages even in movable zones. > > > 3) migrate_pages should really tell us whether the failure is temporal > > > or permanent. I am not sure we can do that easily though. > > > > 2) should be the most simple one for now. Could you give it a try? Btw. > > the exact configuration that led to boothmem pages in zone_movable would > > be really appreciated: > > Here is some information: > > ** Qemu cmdline: > > # qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ... > # Option movablecore=4G (cmdline) > > ** e820 map and some numa information: > > linux kernel: BIOS-provided physical RAM map: > linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable > linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved > linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved > linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable > linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved > linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved > linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved > linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable > linux kernel: NX (Execute Disable) protection: active > linux kernel: SMBIOS 2.8 present. > linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org > linux kernel: Hypervisor detected: KVM > linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved > linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable > linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000 > > linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 > linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1 > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff] > linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff] > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug > linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0 > linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0 > linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff] > linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff] Could you also paste "Zone ranges:" and the follow up messages? From the zoneinfo it seems the movable zone got placed to both nodes. And only Node0 is marked as hotplugable so early allocations can be placed to Node1. > ** /proc/zoneinfo [...] > Node 0, zone Movable > pages free 160140 > min 1823 > low 2278 > high 2733 > spanned 262144 > present 262144 > managed 245670 it seems that 1G went to Node0 > Node 1, zone Movable [...] > pages free 448427 > min 3827 > low 4783 > high 5739 > spanned 524288 > present 524288 > managed 515766 and the rest to Node1. Guessing from spanned-managed it seems that used memory is for memmaps (struct page arrays).
On Wed, May 23, 2018 at 01:38:57PM +0200, Michal Hocko wrote: > On Wed 23-05-18 12:26:43, Oscar Salvador wrote: > > On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote: > > > On Wed 23-05-18 09:52:39, Michal Hocko wrote: > > > [...] > > > > Yeah, the current code is far from optimal. We > > > > used to have a retry count but that one was removed exactly because of > > > > premature failures. There are three things here > > > > 1) zone_movable should contain any bootmem or otherwise non-migrateable > > > > pages > > > > 2) start_isolate_page_range should fail when seeing such pages - maybe > > > > has_unmovable_pages is overly optimistic and it should check all > > > > pages even in movable zones. > > > > 3) migrate_pages should really tell us whether the failure is temporal > > > > or permanent. I am not sure we can do that easily though. > > > > > > 2) should be the most simple one for now. Could you give it a try? Btw. > > > the exact configuration that led to boothmem pages in zone_movable would > > > be really appreciated: > > > > Here is some information: > > > > ** Qemu cmdline: > > > > # qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ... > > # Option movablecore=4G (cmdline) > > > > ** e820 map and some numa information: > > > > linux kernel: BIOS-provided physical RAM map: > > linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable > > linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved > > linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved > > linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable > > linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved > > linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved > > linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved > > linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable > > linux kernel: NX (Execute Disable) protection: active > > linux kernel: SMBIOS 2.8 present. > > linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org > > linux kernel: Hypervisor detected: KVM > > linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved > > linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable > > linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000 > > > > linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 > > linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1 > > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] > > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] > > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff] > > linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff] > > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug > > linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0 > > linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0 > > linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff] > > linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff] > > Could you also paste > "Zone ranges:" > and the follow up messages? Michal, here is the output about "Zone ranges:" linux kernel: Zone ranges: linux kernel: DMA [mem 0x0000000000001000-0x0000000000ffffff] linux kernel: DMA32 [mem 0x0000000001000000-0x00000000ffffffff] linux kernel: Normal [mem 0x0000000100000000-0x00000001bfffffff] linux kernel: Device empty linux kernel: Movable zone start for each node linux kernel: Node 0: 0x0000000100000000 linux kernel: Node 1: 0x0000000140000000 linux kernel: Early memory node ranges linux kernel: node 0: [mem 0x0000000000001000-0x000000000009efff] linux kernel: node 0: [mem 0x0000000000100000-0x00000000bffdffff] linux kernel: node 0: [mem 0x0000000100000000-0x000000013fffffff] linux kernel: node 1: [mem 0x0000000140000000-0x00000001bfffffff] linux kernel: Initmem setup node 0 [mem 0x0000000000001000-0x000000013fffffff] linux kernel: On node 0 totalpages: 1048446 linux kernel: DMA zone: 64 pages used for memmap linux kernel: DMA zone: 21 pages reserved linux kernel: DMA zone: 3998 pages, LIFO batch:0 linux kernel: DMA32 zone: 12224 pages used for memmap linux kernel: DMA32 zone: 782304 pages, LIFO batch:31 linux kernel: Movable zone: 4096 pages used for memmap linux kernel: Movable zone: 262144 pages, LIFO batch:31 linux kernel: Initmem setup node 1 [mem 0x0000000140000000-0x00000001bfffffff] linux kernel: On node 1 totalpages: 524288 linux kernel: Movable zone: 8192 pages used for memmap linux kernel: Movable zone: 524288 pages, LIFO batch:31 linux kernel: Reserved but unavailable: 98 pages Oscar Salvador
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c6f4008ea55..b9a45753244d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7629,11 +7629,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, unsigned long pfn, iter, found; /* - * For avoiding noise data, lru_add_drain_all() should be called - * If ZONE_MOVABLE, the zone never contains unmovable pages + * TODO we could make this much more efficient by not checking every + * page in the range if we know all of them are in MOVABLE_ZONE and + * that the movable zone guarantees that pages are migratable but + * the later is not the case right now unfortunatelly. E.g. movablecore + * can still lead to having bootmem allocations in zone_movable. */ - if (zone_idx(zone) == ZONE_MOVABLE) - return false; /* * CMA allocations (alloc_contig_range) really need to mark isolate @@ -7654,7 +7655,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); if (PageReserved(page)) - return true; + goto unmovable; /* * Hugepages are not in LRU lists, but they're movable. @@ -7704,9 +7705,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, * page at boot. */ if (found > count) - return true; + goto unmovable; } return false; +unmovable: + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); + return true; } #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)