mbox series

[v3,0/2] Make alloc_contig_range handle Hugetlb pages

Message ID 20210222135137.25717-1-osalvador@suse.de (mailing list archive)
Headers show
Series Make alloc_contig_range handle Hugetlb pages | expand

Message

Oscar Salvador Feb. 22, 2021, 1:51 p.m. UTC
v2 -> v3:
 - Drop usage of high-level generic helpers in favour of
   low-level approach (per Michal)
 - Check for the page to be marked as PageHugeFreed
 - Add a one-time retry in case someone grabbed the free huge page
   from under us

v1 -> v2:
 - Adressed feedback by Michal
 - Restrict the allocation to a node with __GFP_THISNODE
 - Drop PageHuge check in alloc_and_dissolve_huge_page
 - Re-order comments in isolate_or_dissolve_huge_page
 - Extend comment in isolate_migratepages_block
 - Place put_page right after we got the page, otherwise
   dissolve_free_huge_page will fail

 RFC -> v1:
 - Drop RFC
 - Addressed feedback from David and Mike
 - Fence off gigantic pages as there is a cyclic dependency between
   them and alloc_contig_range
 - Re-organize the code to make race-window smaller and to put
   all details in hugetlb code
 - Drop nodemask initialization. First a node will be tried and then we
   will back to other nodes containing memory (N_MEMORY). Details in
   patch#1's changelog
 - Count new page as surplus in case we failed to dissolve the old page
   and the new one. Details in patch#1.

Cover letter:

 alloc_contig_range lacks the hability for handling HugeTLB pages.
 This can be problematic for some users, e.g: CMA and virtio-mem, where those
 users will fail the call if alloc_contig_range ever sees a HugeTLB page, even
 when those pages lay in ZONE_MOVABLE and are free.
 That problem can be easily solved by replacing the page in the free hugepage
 pool.

 In-use HugeTLB are no exception though, as those can be isolated and migrated
 as any other LRU or Movable page.

 This patchset aims for improving alloc_contig_range->isolate_migratepages_block,
 so HugeTLB pages can be recognized and handled.

 Below is an insight from David (thanks), where the problem can clearly be seen:

 "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
 ZONE_MOVABLE. Allocate 512 huge pages.

 [root@localhost ~]# cat /proc/meminfo
 MemTotal:        5061512 kB
 MemFree:         3319396 kB
 MemAvailable:    3457144 kB
 ...
 HugePages_Total:     512
 HugePages_Free:      512
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB


 The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
 1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:

 [  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
 [  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
 [  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
 [  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
 [  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
 [  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
 [  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
 [  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
 [  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
 [  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"

Oscar Salvador (2):
  mm: Make alloc_contig_range handle free hugetlb pages
  mm: Make alloc_contig_range handle in-use hugetlb pages

 include/linux/hugetlb.h |   7 +++
 mm/compaction.c         |  22 +++++++++
 mm/hugetlb.c            | 124 +++++++++++++++++++++++++++++++++++++++++++++++-
 mm/vmscan.c             |   5 +-
 4 files changed, 154 insertions(+), 4 deletions(-)

Comments

David Hildenbrand March 1, 2021, 12:43 p.m. UTC | #1
On 22.02.21 14:51, Oscar Salvador wrote:
> v2 -> v3:
>   - Drop usage of high-level generic helpers in favour of
>     low-level approach (per Michal)
>   - Check for the page to be marked as PageHugeFreed
>   - Add a one-time retry in case someone grabbed the free huge page
>     from under us
> 
> v1 -> v2:
>   - Adressed feedback by Michal
>   - Restrict the allocation to a node with __GFP_THISNODE
>   - Drop PageHuge check in alloc_and_dissolve_huge_page
>   - Re-order comments in isolate_or_dissolve_huge_page
>   - Extend comment in isolate_migratepages_block
>   - Place put_page right after we got the page, otherwise
>     dissolve_free_huge_page will fail
> 
>   RFC -> v1:
>   - Drop RFC
>   - Addressed feedback from David and Mike
>   - Fence off gigantic pages as there is a cyclic dependency between
>     them and alloc_contig_range
>   - Re-organize the code to make race-window smaller and to put
>     all details in hugetlb code
>   - Drop nodemask initialization. First a node will be tried and then we
>     will back to other nodes containing memory (N_MEMORY). Details in
>     patch#1's changelog
>   - Count new page as surplus in case we failed to dissolve the old page
>     and the new one. Details in patch#1.
> 
> Cover letter:
> 
>   alloc_contig_range lacks the hability for handling HugeTLB pages.
>   This can be problematic for some users, e.g: CMA and virtio-mem, where those
>   users will fail the call if alloc_contig_range ever sees a HugeTLB page, even
>   when those pages lay in ZONE_MOVABLE and are free.
>   That problem can be easily solved by replacing the page in the free hugepage
>   pool.
> 
>   In-use HugeTLB are no exception though, as those can be isolated and migrated
>   as any other LRU or Movable page.
> 
>   This patchset aims for improving alloc_contig_range->isolate_migratepages_block,
>   so HugeTLB pages can be recognized and handled.
> 
>   Below is an insight from David (thanks), where the problem can clearly be seen:
> 
>   "Start a VM with 4G. Hotplug 1G via virtio-mem and online it to
>   ZONE_MOVABLE. Allocate 512 huge pages.
> 
>   [root@localhost ~]# cat /proc/meminfo
>   MemTotal:        5061512 kB
>   MemFree:         3319396 kB
>   MemAvailable:    3457144 kB
>   ...
>   HugePages_Total:     512
>   HugePages_Free:      512
>   HugePages_Rsvd:        0
>   HugePages_Surp:        0
>   Hugepagesize:       2048 kB
> 
> 
>   The huge pages get partially allocate from ZONE_MOVABLE. Try unplugging
>   1G via virtio-mem (remember, all ZONE_MOVABLE). Inside the guest:
> 
>   [  180.058992] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.060531] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.061972] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.063413] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.064838] alloc_contig_range: [1b8000, 1c0000) PFNs busy
>   [  180.065848] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>   [  180.066794] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>   [  180.067738] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>   [  180.068669] alloc_contig_range: [1bfc00, 1c0000) PFNs busy
>   [  180.069598] alloc_contig_range: [1bfc00, 1c0000) PFNs busy"

Same experiment with ZONE_MOVABLE:

a) Free huge pages: all memory can get unplugged again.

b) Allocated/populated but idle huge pages: all memory can get unplugged 
again.

c) Allocated/populated but all 512 huge pages are read/written in a 
loop: all memory can get unplugged again, but I get a single

[  121.192345] alloc_contig_range: [180000, 188000) PFNs busy

Most probably because it happened to try migrating a huge page while it 
was busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it 
can deal with this temporary failure.



Last but not least, I did something extreme:

]# cat /proc/meminfo
MemTotal:        5061568 kB
MemFree:          186560 kB
MemAvailable:     354524 kB
...
HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0


Triggering unplug would require to dissolve+alloc - which now fails when 
trying to allocate an additional ~512 huge pages (1G).


As expected, I can properly see memory unplug not fully succeeding. + I 
get a fairly continuous stream of

[  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
...

But more importantly, the hugepage count remains stable, as configured 
by the admin (me):

HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0
Oscar Salvador March 1, 2021, 12:57 p.m. UTC | #2
On Mon, Mar 01, 2021 at 01:43:00PM +0100, David Hildenbrand wrote:
> Same experiment with ZONE_MOVABLE:
> 
> a) Free huge pages: all memory can get unplugged again.
> 
> b) Allocated/populated but idle huge pages: all memory can get unplugged
> again.
> 
> c) Allocated/populated but all 512 huge pages are read/written in a loop:
> all memory can get unplugged again, but I get a single
> 
> [  121.192345] alloc_contig_range: [180000, 188000) PFNs busy
> 
> Most probably because it happened to try migrating a huge page while it was
> busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it can deal
> with this temporary failure.
> 
> 
> 
> Last but not least, I did something extreme:
> 
> ]# cat /proc/meminfo
> MemTotal:        5061568 kB
> MemFree:          186560 kB
> MemAvailable:     354524 kB
> ...
> HugePages_Total:    2048
> HugePages_Free:     2048
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> 
> Triggering unplug would require to dissolve+alloc - which now fails when
> trying to allocate an additional ~512 huge pages (1G).
> 
> 
> As expected, I can properly see memory unplug not fully succeeding. + I get
> a fairly continuous stream of
> 
> [  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
> ...
> 
> But more importantly, the hugepage count remains stable, as configured by
> the admin (me):
> 
> HugePages_Total:    2048
> HugePages_Free:     2048
> HugePages_Rsvd:        0
> HugePages_Surp:        0

Thanks for giving it a spin David, that is highly appreciated ;-)!

I will add above information in next's version changelog if you do not mind,
so the before-and-after can be seen clearly.

I shall send v4 in the course of the next few days.
David Hildenbrand March 1, 2021, 12:59 p.m. UTC | #3
On 01.03.21 13:57, Oscar Salvador wrote:
> On Mon, Mar 01, 2021 at 01:43:00PM +0100, David Hildenbrand wrote:
>> Same experiment with ZONE_MOVABLE:
>>
>> a) Free huge pages: all memory can get unplugged again.
>>
>> b) Allocated/populated but idle huge pages: all memory can get unplugged
>> again.
>>
>> c) Allocated/populated but all 512 huge pages are read/written in a loop:
>> all memory can get unplugged again, but I get a single
>>
>> [  121.192345] alloc_contig_range: [180000, 188000) PFNs busy
>>
>> Most probably because it happened to try migrating a huge page while it was
>> busy. As virtio-mem retries on ZONE_MOVABLE a couple of times, it can deal
>> with this temporary failure.
>>
>>
>>
>> Last but not least, I did something extreme:
>>
>> ]# cat /proc/meminfo
>> MemTotal:        5061568 kB
>> MemFree:          186560 kB
>> MemAvailable:     354524 kB
>> ...
>> HugePages_Total:    2048
>> HugePages_Free:     2048
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>>
>>
>> Triggering unplug would require to dissolve+alloc - which now fails when
>> trying to allocate an additional ~512 huge pages (1G).
>>
>>
>> As expected, I can properly see memory unplug not fully succeeding. + I get
>> a fairly continuous stream of
>>
>> [  226.611584] alloc_contig_range: [19f400, 19f800) PFNs busy
>> ...
>>
>> But more importantly, the hugepage count remains stable, as configured by
>> the admin (me):
>>
>> HugePages_Total:    2048
>> HugePages_Free:     2048
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
> 
> Thanks for giving it a spin David, that is highly appreciated ;-)!
> 
> I will add above information in next's version changelog if you do not mind,
> so the before-and-after can be seen clearly.
> 
> I shall send v4 in the course of the next few days.
> 

I'll have some review feedback on error handling that might be improved, 
I'll share that shortly.