Message ID | 20230905214412.89152-9-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Batch hugetlb vmemmap modification operations | expand |
On 2023/9/6 05:44, Mike Kravetz wrote: > Now that batching of hugetlb vmemmap optimization processing is possible, > batch the freeing of vmemmap pages. When freeing vmemmap pages for a > hugetlb page, we add them to a list that is freed after the entire batch > has been processed. > > This enhances the ability to return contiguous ranges of memory to the > low level allocators. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/hugetlb_vmemmap.c | 60 ++++++++++++++++++++++++++++---------------- > 1 file changed, 38 insertions(+), 22 deletions(-) > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index 79de984919ef..a715712df831 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -306,18 +306,21 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, > * @end: end address of the vmemmap virtual address range that we want to > * remap. > * @reuse: reuse address. > + * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers > + * responsibility to free pages. > * > * Return: %0 on success, negative error code otherwise. > */ > static int vmemmap_remap_free(unsigned long start, unsigned long end, > - unsigned long reuse) > + unsigned long reuse, > + struct list_head *vmemmap_pages) > { > int ret; > - LIST_HEAD(vmemmap_pages); > + LIST_HEAD(freed_pages); IIUC, we could reuse the parameter of @vmemmap_pages directly instead of a temporary variable, could it be dropped? > struct vmemmap_remap_walk walk = { > .remap_pte = vmemmap_remap_pte, > .reuse_addr = reuse, > - .vmemmap_pages = &vmemmap_pages, > + .vmemmap_pages = &freed_pages, > }; > int nid = page_to_nid((struct page *)start); > gfp_t gfp_mask = GFP_KERNEL | __GFP_THISNODE | __GFP_NORETRY | > @@ -335,7 +338,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, > if (walk.reuse_page) { > copy_page(page_to_virt(walk.reuse_page), > (void *)walk.reuse_addr); > - list_add(&walk.reuse_page->lru, &vmemmap_pages); > + list_add(&walk.reuse_page->lru, &freed_pages); > } > > /* > @@ -366,15 +369,14 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, > walk = (struct vmemmap_remap_walk) { > .remap_pte = vmemmap_restore_pte, > .reuse_addr = reuse, > - .vmemmap_pages = &vmemmap_pages, > + .vmemmap_pages = &freed_pages, > }; > > vmemmap_remap_range(reuse, end, &walk); > } > mmap_read_unlock(&init_mm); > > - free_vmemmap_page_list(&vmemmap_pages); > - > + list_splice(&freed_pages, vmemmap_pages); > return ret; > } > > @@ -553,17 +555,9 @@ static bool vmemmap_should_optimize(const struct hstate *h, const struct page *h > return true; > } > > -/** > - * hugetlb_vmemmap_optimize - optimize @head page's vmemmap pages. > - * @h: struct hstate. > - * @head: the head page whose vmemmap pages will be optimized. > - * > - * This function only tries to optimize @head's vmemmap pages and does not > - * guarantee that the optimization will succeed after it returns. The caller > - * can use HPageVmemmapOptimized(@head) to detect if @head's vmemmap pages > - * have been optimized. > - */ > -void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) > +static void __hugetlb_vmemmap_optimize(const struct hstate *h, > + struct page *head, > + struct list_head *vmemmap_pages) > { > unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > unsigned long vmemmap_reuse; > @@ -580,21 +574,43 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) > > /* > * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end) > - * to the page which @vmemmap_reuse is mapped to, then free the pages > - * which the range [@vmemmap_start, @vmemmap_end] is mapped to. > + * to the page which @vmemmap_reuse is mapped to. Add pages previously > + * mapping the range to vmemmap_pages list so that they can be freed by > + * the caller. > */ > - if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse)) > + if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, vmemmap_pages)) > static_branch_dec(&hugetlb_optimize_vmemmap_key); > else > SetHPageVmemmapOptimized(head); > } > > +/** > + * hugetlb_vmemmap_optimize - optimize @head page's vmemmap pages. > + * @h: struct hstate. > + * @head: the head page whose vmemmap pages will be optimized. > + * > + * This function only tries to optimize @head's vmemmap pages and does not > + * guarantee that the optimization will succeed after it returns. The caller > + * can use HPageVmemmapOptimized(@head) to detect if @head's vmemmap pages > + * have been optimized. > + */ > +void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) > +{ > + LIST_HEAD(vmemmap_pages); > + > + __hugetlb_vmemmap_optimize(h, head, &vmemmap_pages); > + free_vmemmap_page_list(&vmemmap_pages); > +} > + > void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_list) > { > struct folio *folio; > + LIST_HEAD(vmemmap_pages); > > list_for_each_entry(folio, folio_list, lru) > - hugetlb_vmemmap_optimize(h, &folio->page); > + __hugetlb_vmemmap_optimize(h, &folio->page, &vmemmap_pages); > + > + free_vmemmap_page_list(&vmemmap_pages); > } > > static struct ctl_table hugetlb_vmemmap_sysctls[] = {
On 09/06/23 15:38, Muchun Song wrote: > > > On 2023/9/6 05:44, Mike Kravetz wrote: > > Now that batching of hugetlb vmemmap optimization processing is possible, > > batch the freeing of vmemmap pages. When freeing vmemmap pages for a > > hugetlb page, we add them to a list that is freed after the entire batch > > has been processed. > > > > This enhances the ability to return contiguous ranges of memory to the > > low level allocators. > > > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > --- > > mm/hugetlb_vmemmap.c | 60 ++++++++++++++++++++++++++++---------------- > > 1 file changed, 38 insertions(+), 22 deletions(-) > > > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > index 79de984919ef..a715712df831 100644 > > --- a/mm/hugetlb_vmemmap.c > > +++ b/mm/hugetlb_vmemmap.c > > @@ -306,18 +306,21 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, > > * @end: end address of the vmemmap virtual address range that we want to > > * remap. > > * @reuse: reuse address. > > + * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers > > + * responsibility to free pages. > > * > > * Return: %0 on success, negative error code otherwise. > > */ > > static int vmemmap_remap_free(unsigned long start, unsigned long end, > > - unsigned long reuse) > > + unsigned long reuse, > > + struct list_head *vmemmap_pages) > > { > > int ret; > > - LIST_HEAD(vmemmap_pages); > > + LIST_HEAD(freed_pages); > > IIUC, we could reuse the parameter of @vmemmap_pages directly instead of > a temporary variable, could it be dropped? > I was concerned about the error case where we call vmemmap_remap_range a second time. In the first call to vmemmap_remap_range with vmemmap_remap_pte, vmemmap pages to be freed are added to the end of the list (list_add_tail). In the call to vmemmap_remap_range after error with vmemmap_restore_pte, pages are taken off the head of the list (list_first_entry). So, it seems that it would be possible to use a different set of pages in the restore operation. This would be an issue if pages had different characteristics such as being on different nodes. Is that a real concern? I suppose we could change vmemmap_remap_pte to add pages to the head of the list? I do not recall the reasoning behind adding to tail.
> On Sep 7, 2023, at 05:38, Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 09/06/23 15:38, Muchun Song wrote: >> >> >> On 2023/9/6 05:44, Mike Kravetz wrote: >>> Now that batching of hugetlb vmemmap optimization processing is possible, >>> batch the freeing of vmemmap pages. When freeing vmemmap pages for a >>> hugetlb page, we add them to a list that is freed after the entire batch >>> has been processed. >>> >>> This enhances the ability to return contiguous ranges of memory to the >>> low level allocators. >>> >>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> >>> --- >>> mm/hugetlb_vmemmap.c | 60 ++++++++++++++++++++++++++++---------------- >>> 1 file changed, 38 insertions(+), 22 deletions(-) >>> >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>> index 79de984919ef..a715712df831 100644 >>> --- a/mm/hugetlb_vmemmap.c >>> +++ b/mm/hugetlb_vmemmap.c >>> @@ -306,18 +306,21 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, >>> * @end: end address of the vmemmap virtual address range that we want to >>> * remap. >>> * @reuse: reuse address. >>> + * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers >>> + * responsibility to free pages. >>> * >>> * Return: %0 on success, negative error code otherwise. >>> */ >>> static int vmemmap_remap_free(unsigned long start, unsigned long end, >>> - unsigned long reuse) >>> + unsigned long reuse, >>> + struct list_head *vmemmap_pages) >>> { >>> int ret; >>> - LIST_HEAD(vmemmap_pages); >>> + LIST_HEAD(freed_pages); >> >> IIUC, we could reuse the parameter of @vmemmap_pages directly instead of >> a temporary variable, could it be dropped? >> > > I was concerned about the error case where we call vmemmap_remap_range a > second time. In the first call to vmemmap_remap_range with vmemmap_remap_pte, > vmemmap pages to be freed are added to the end of the list (list_add_tail). > In the call to vmemmap_remap_range after error with vmemmap_restore_pte, > pages are taken off the head of the list (list_first_entry). So, it seems > that it would be possible to use a different set of pages in the restore Yes. > operation. This would be an issue if pages had different characteristics such > as being on different nodes. Is that a real concern? A good point. Now I see your concern, it is better to keep the same node as before when error occurs. > > I suppose we could change vmemmap_remap_pte to add pages to the head of > the list? I do not recall the reasoning behind adding to tail. I think we could do this, the code will be a little simple. Actually, there is no reason behind adding to tail (BTW, the first commit is introduced by me, no secret here :-)). Thanks. > -- > Mike Kravetz
On 09/07/23 14:19, Muchun Song wrote: > > > > On Sep 7, 2023, at 05:38, Mike Kravetz <mike.kravetz@oracle.com> wrote: > > > > On 09/06/23 15:38, Muchun Song wrote: > >> > >> > >> On 2023/9/6 05:44, Mike Kravetz wrote: > >>> Now that batching of hugetlb vmemmap optimization processing is possible, > >>> batch the freeing of vmemmap pages. When freeing vmemmap pages for a > >>> hugetlb page, we add them to a list that is freed after the entire batch > >>> has been processed. > >>> > >>> This enhances the ability to return contiguous ranges of memory to the > >>> low level allocators. > >>> > >>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > >>> --- > >>> mm/hugetlb_vmemmap.c | 60 ++++++++++++++++++++++++++++---------------- > >>> 1 file changed, 38 insertions(+), 22 deletions(-) > >>> > >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > >>> index 79de984919ef..a715712df831 100644 > >>> --- a/mm/hugetlb_vmemmap.c > >>> +++ b/mm/hugetlb_vmemmap.c > >>> @@ -306,18 +306,21 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, > >>> * @end: end address of the vmemmap virtual address range that we want to > >>> * remap. > >>> * @reuse: reuse address. > >>> + * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers > >>> + * responsibility to free pages. > >>> * > >>> * Return: %0 on success, negative error code otherwise. > >>> */ > >>> static int vmemmap_remap_free(unsigned long start, unsigned long end, > >>> - unsigned long reuse) > >>> + unsigned long reuse, > >>> + struct list_head *vmemmap_pages) > >>> { > >>> int ret; > >>> - LIST_HEAD(vmemmap_pages); > >>> + LIST_HEAD(freed_pages); > >> > >> IIUC, we could reuse the parameter of @vmemmap_pages directly instead of > >> a temporary variable, could it be dropped? > >> > > > > I was concerned about the error case where we call vmemmap_remap_range a > > second time. In the first call to vmemmap_remap_range with vmemmap_remap_pte, > > vmemmap pages to be freed are added to the end of the list (list_add_tail). > > In the call to vmemmap_remap_range after error with vmemmap_restore_pte, > > pages are taken off the head of the list (list_first_entry). So, it seems > > that it would be possible to use a different set of pages in the restore > > Yes. > > > operation. This would be an issue if pages had different characteristics such > > as being on different nodes. Is that a real concern? > > A good point. Now I see your concern, it is better to keep the same node > as before when error occurs. > > > > > I suppose we could change vmemmap_remap_pte to add pages to the head of > > the list? I do not recall the reasoning behind adding to tail. > > I think we could do this, the code will be a little simple. Actually, there > is no reason behind adding to tail (BTW, the first commit is introduced by > me, no secret here :-)). Ok, I will change the way pages are added and removed from the list so that in case of error we get the same pages. Then I can remove the local list.
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 79de984919ef..a715712df831 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -306,18 +306,21 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, * @end: end address of the vmemmap virtual address range that we want to * remap. * @reuse: reuse address. + * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers + * responsibility to free pages. * * Return: %0 on success, negative error code otherwise. */ static int vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse) + unsigned long reuse, + struct list_head *vmemmap_pages) { int ret; - LIST_HEAD(vmemmap_pages); + LIST_HEAD(freed_pages); struct vmemmap_remap_walk walk = { .remap_pte = vmemmap_remap_pte, .reuse_addr = reuse, - .vmemmap_pages = &vmemmap_pages, + .vmemmap_pages = &freed_pages, }; int nid = page_to_nid((struct page *)start); gfp_t gfp_mask = GFP_KERNEL | __GFP_THISNODE | __GFP_NORETRY | @@ -335,7 +338,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, if (walk.reuse_page) { copy_page(page_to_virt(walk.reuse_page), (void *)walk.reuse_addr); - list_add(&walk.reuse_page->lru, &vmemmap_pages); + list_add(&walk.reuse_page->lru, &freed_pages); } /* @@ -366,15 +369,14 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, walk = (struct vmemmap_remap_walk) { .remap_pte = vmemmap_restore_pte, .reuse_addr = reuse, - .vmemmap_pages = &vmemmap_pages, + .vmemmap_pages = &freed_pages, }; vmemmap_remap_range(reuse, end, &walk); } mmap_read_unlock(&init_mm); - free_vmemmap_page_list(&vmemmap_pages); - + list_splice(&freed_pages, vmemmap_pages); return ret; } @@ -553,17 +555,9 @@ static bool vmemmap_should_optimize(const struct hstate *h, const struct page *h return true; } -/** - * hugetlb_vmemmap_optimize - optimize @head page's vmemmap pages. - * @h: struct hstate. - * @head: the head page whose vmemmap pages will be optimized. - * - * This function only tries to optimize @head's vmemmap pages and does not - * guarantee that the optimization will succeed after it returns. The caller - * can use HPageVmemmapOptimized(@head) to detect if @head's vmemmap pages - * have been optimized. - */ -void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) +static void __hugetlb_vmemmap_optimize(const struct hstate *h, + struct page *head, + struct list_head *vmemmap_pages) { unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; unsigned long vmemmap_reuse; @@ -580,21 +574,43 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) /* * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end) - * to the page which @vmemmap_reuse is mapped to, then free the pages - * which the range [@vmemmap_start, @vmemmap_end] is mapped to. + * to the page which @vmemmap_reuse is mapped to. Add pages previously + * mapping the range to vmemmap_pages list so that they can be freed by + * the caller. */ - if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse)) + if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, vmemmap_pages)) static_branch_dec(&hugetlb_optimize_vmemmap_key); else SetHPageVmemmapOptimized(head); } +/** + * hugetlb_vmemmap_optimize - optimize @head page's vmemmap pages. + * @h: struct hstate. + * @head: the head page whose vmemmap pages will be optimized. + * + * This function only tries to optimize @head's vmemmap pages and does not + * guarantee that the optimization will succeed after it returns. The caller + * can use HPageVmemmapOptimized(@head) to detect if @head's vmemmap pages + * have been optimized. + */ +void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) +{ + LIST_HEAD(vmemmap_pages); + + __hugetlb_vmemmap_optimize(h, head, &vmemmap_pages); + free_vmemmap_page_list(&vmemmap_pages); +} + void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_list) { struct folio *folio; + LIST_HEAD(vmemmap_pages); list_for_each_entry(folio, folio_list, lru) - hugetlb_vmemmap_optimize(h, &folio->page); + __hugetlb_vmemmap_optimize(h, &folio->page, &vmemmap_pages); + + free_vmemmap_page_list(&vmemmap_pages); } static struct ctl_table hugetlb_vmemmap_sysctls[] = {
Now that batching of hugetlb vmemmap optimization processing is possible, batch the freeing of vmemmap pages. When freeing vmemmap pages for a hugetlb page, we add them to a list that is freed after the entire batch has been processed. This enhances the ability to return contiguous ranges of memory to the low level allocators. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb_vmemmap.c | 60 ++++++++++++++++++++++++++++---------------- 1 file changed, 38 insertions(+), 22 deletions(-)