Message ID | 20210222135137.25717-1-osalvador@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | Make alloc_contig_range handle Hugetlb pages | expand |
On 22.02.21 14:51, Oscar Salvador wrote: > v2 -> v3: > - Drop usage of high-level generic helpers in favour of > low-level approach (per Michal) > - Check for the page to be marked as PageHugeFreed > - Add a one-time retry in case someone grabbed the free huge page > from under us > > v1 -> v2: > - Adressed feedback by Michal > - Restrict the allocation to a node with __GFP_THISNODE > - Drop PageHuge check in alloc_and_dissolve_huge_page > - Re-order comments in isolate_or_dissolve_huge_page > - Extend comment in isolate_migratepages_block > - Place put_page right after we got the page, otherwise > dissolve_free_huge_page will fail > > RFC -> v1: > - Drop RFC > - Addressed feedback from David and Mike > - Fence off gigantic pages as there is a cyclic dependency between > them and alloc_contig_range > - Re-organize the code to make race-window smaller and to put > all details in hugetlb code > - Drop nodemask initialization. First a node will be tried and then we > will back to other nodes containing memory (N_MEMORY). Details in > patch#1's changelog > - Count new page as surplus in case we failed to dissolve the old page > and the new one. Details in patch#1. > > Cover letter: > > alloc_contig_range lacks the hability for handling HugeTLB pages. > This can be problematic for some users, e.g: CMA and virtio-mem, where those > users will fail the call if alloc_contig_range ever sees a HugeTLB page, even > when those pages lay in ZONE_MOVABLE and are free. > That problem can be easily solved by replacing the page in the free hugepage > pool. > > In-use HugeTLB are no exception though, as those can be isolated and migrated > as any other LRU or Movable page. > > This patchset aims for improving alloc_contig_range->isolate_migratepages_block, > so HugeTLB pages can be recognized and handled. > > Below is an insight from David (thanks), where the problem can clearly be seen: > > "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to > ZONE_MOVABLE. Allocate 512 huge pages. > > [root@localhost ~]# cat /proc/meminfo > MemTotal: 5061512 kB > MemFree: 3319396 kB > MemAvailable: 3457144 kB > ... > HugePages_Total: 512 > HugePages_Free: 512 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > > > The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging > 1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest: > > [ 180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy > [ 180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy > [ 180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy > [ 180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy > [ 180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy > [ 180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy > [ 180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy > [ 180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy > [ 180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy > [ 180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy" Same experiment with ZONE_MOVABLE: a) Free huge pages: all memory can get unplugged again. b) Allocated/populated but idle huge pages: all memory can get unplugged again. c) Allocated/populated but all 512 huge pages are read/written in a loop: all memory can get unplugged again, but I get a single [ 121.192345] alloc_contig_range: [180000, 188000) PFNs busy Most probably because it happened to try migrating a huge page while it was busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it can deal with this temporary failure. Last but not least, I did something extreme: ]# cat /proc/meminfo MemTotal: 5061568 kB MemFree: 186560 kB MemAvailable: 354524 kB ... HugePages_Total: 2048 HugePages_Free: 2048 HugePages_Rsvd: 0 HugePages_Surp: 0 Triggering unplug would require to dissolve+alloc - which now fails when trying to allocate an additional ~512 huge pages (1G). As expected, I can properly see memory unplug not fully succeeding. + I get a fairly continuous stream of [ 226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy ... But more importantly, the hugepage count remains stable, as configured by the admin (me): HugePages_Total: 2048 HugePages_Free: 2048 HugePages_Rsvd: 0 HugePages_Surp: 0
On Mon, Mar 01, 2021 at 01:43:00PM +0100, David Hildenbrand wrote: > Same experiment with ZONE_MOVABLE: > > a) Free huge pages: all memory can get unplugged again. > > b) Allocated/populated but idle huge pages: all memory can get unplugged > again. > > c) Allocated/populated but all 512 huge pages are read/written in a loop: > all memory can get unplugged again, but I get a single > > [ 121.192345] alloc_contig_range: [180000, 188000) PFNs busy > > Most probably because it happened to try migrating a huge page while it was > busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it can deal > with this temporary failure. > > > > Last but not least, I did something extreme: > > ]# cat /proc/meminfo > MemTotal: 5061568 kB > MemFree: 186560 kB > MemAvailable: 354524 kB > ... > HugePages_Total: 2048 > HugePages_Free: 2048 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > > Triggering unplug would require to dissolve+alloc - which now fails when > trying to allocate an additional ~512 huge pages (1G). > > > As expected, I can properly see memory unplug not fully succeeding. + I get > a fairly continuous stream of > > [ 226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy > ... > > But more importantly, the hugepage count remains stable, as configured by > the admin (me): > > HugePages_Total: 2048 > HugePages_Free: 2048 > HugePages_Rsvd: 0 > HugePages_Surp: 0 Thanks for giving it a spin David, that is highly appreciated ;-)! I will add above information in next's version changelog if you do not mind, so the before-and-after can be seen clearly. I shall send v4 in the course of the next few days.
On 01.03.21 13:57, Oscar Salvador wrote: > On Mon, Mar 01, 2021 at 01:43:00PM +0100, David Hildenbrand wrote: >> Same experiment with ZONE_MOVABLE: >> >> a) Free huge pages: all memory can get unplugged again. >> >> b) Allocated/populated but idle huge pages: all memory can get unplugged >> again. >> >> c) Allocated/populated but all 512 huge pages are read/written in a loop: >> all memory can get unplugged again, but I get a single >> >> [ 121.192345] alloc_contig_range: [180000, 188000) PFNs busy >> >> Most probably because it happened to try migrating a huge page while it was >> busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it can deal >> with this temporary failure. >> >> >> >> Last but not least, I did something extreme: >> >> ]# cat /proc/meminfo >> MemTotal: 5061568 kB >> MemFree: 186560 kB >> MemAvailable: 354524 kB >> ... >> HugePages_Total: 2048 >> HugePages_Free: 2048 >> HugePages_Rsvd: 0 >> HugePages_Surp: 0 >> >> >> Triggering unplug would require to dissolve+alloc - which now fails when >> trying to allocate an additional ~512 huge pages (1G). >> >> >> As expected, I can properly see memory unplug not fully succeeding. + I get >> a fairly continuous stream of >> >> [ 226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy >> ... >> >> But more importantly, the hugepage count remains stable, as configured by >> the admin (me): >> >> HugePages_Total: 2048 >> HugePages_Free: 2048 >> HugePages_Rsvd: 0 >> HugePages_Surp: 0 > > Thanks for giving it a spin David, that is highly appreciated ;-)! > > I will add above information in next's version changelog if you do not mind, > so the before-and-after can be seen clearly. > > I shall send v4 in the course of the next few days. > I'll have some review feedback on error handling that might be improved, I'll share that shortly.