Message ID | 20230905214412.89152-2-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Batch hugetlb vmemmap modification operations | expand |
On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote: > @@ -456,6 +457,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head) > unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > unsigned long vmemmap_reuse; > > + VM_WARN_ON_ONCE(!PageHuge(head)); > if (!HPageVmemmapOptimized(head)) > return 0; > > @@ -550,6 +552,7 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) > unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > unsigned long vmemmap_reuse; > > + VM_WARN_ON_ONCE(!PageHuge(head)); > if (!vmemmap_should_optimize(h, head)) > return; Someone who's looking for an easy patch or three should convert both of these functions to take a folio instead of a page. All callers pass &folio->page. Obviously do that work on top of Mike's patch set to avoid creating more work for him.
On 09/06/23 01:48, Matthew Wilcox wrote: > On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote: > > @@ -456,6 +457,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head) > > unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > > unsigned long vmemmap_reuse; > > > > + VM_WARN_ON_ONCE(!PageHuge(head)); > > if (!HPageVmemmapOptimized(head)) > > return 0; > > > > @@ -550,6 +552,7 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) > > unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > > unsigned long vmemmap_reuse; > > > > + VM_WARN_ON_ONCE(!PageHuge(head)); > > if (!vmemmap_should_optimize(h, head)) > > return; > > Someone who's looking for an easy patch or three should convert both > of these functions to take a folio instead of a page. All callers > pass &folio->page. Obviously do that work on top of Mike's patch set > to avoid creating more work for him. I think Muchun already suggested this. It would make sense as this series is proposing two new routines taking a list of folios: - hugetlb_vmemmap_optimize_folios - hugetlb_vmemmap_restore_folios
On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote: > Currently, vmemmap optimization of hugetlb pages is performed before the > hugetlb flag (previously hugetlb destructor) is set identifying it as a > hugetlb folio. This means there is a window of time where an ordinary > folio does not have all associated vmemmap present. The core mm only > expects vmemmap to be potentially optimized for hugetlb and device dax. > This can cause problems in code such as memory error handling that may > want to write to tail struct pages. > > There is only one call to perform hugetlb vmemmap optimization today. > To fix this issue, simply set the hugetlb flag before that call. > > There was a similar issue in the free hugetlb path that was previously > addressed. The two routines that optimize or restore hugetlb vmemmap > should only be passed hugetlb folios/pages. To catch any callers not > following this rule, add VM_WARN_ON calls to the routines. In the > hugetlb free code paths, some calls could be made to restore vmemmap > after clearing the hugetlb flag. This was 'safe' as in these cases > vmemmap was already present and the call was a NOOP. However, for > consistency these calls where eliminated so that we can add the > VM_WARN_ON checks. > > Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when memory_failure() is called on a free hugetlb page with vmemmap optimization disabled (the warning is not triggered if vmemmap optimization is enabled). I think that we need check folio_test_hugetlb() before dissolve_free_huge_page() calls hugetlb_vmemmap_restore_folio(). Could you consider adding some diff like below? diff --git a/mm/hugetlb.c b/mm/hugetlb.c --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2312,15 +2312,16 @@ int dissolve_free_huge_page(struct page *page) * Attempt to allocate vmemmmap here so that we can take * appropriate action on failure. */ - rc = hugetlb_vmemmap_restore_folio(h, folio); - if (!rc) { - update_and_free_hugetlb_folio(h, folio, false); - } else { - spin_lock_irq(&hugetlb_lock); - add_hugetlb_folio(h, folio, false); - h->max_huge_pages++; - spin_unlock_irq(&hugetlb_lock); + if (folio_test_hugetlb(folio)) { + rc = hugetlb_vmemmap_restore_folio(h, folio); + if (rc) { + spin_lock_irq(&hugetlb_lock); + add_hugetlb_folio(h, folio, false); + h->max_huge_pages++; + goto out; + } } + update_and_free_hugetlb_folio(h, folio, false); return rc; } Thanks, Naoya Horiguchi
On 10/13/23 21:58, Naoya Horiguchi wrote: > On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote: > > Currently, vmemmap optimization of hugetlb pages is performed before the > > hugetlb flag (previously hugetlb destructor) is set identifying it as a > > hugetlb folio. This means there is a window of time where an ordinary > > folio does not have all associated vmemmap present. The core mm only > > expects vmemmap to be potentially optimized for hugetlb and device dax. > > This can cause problems in code such as memory error handling that may > > want to write to tail struct pages. > > > > There is only one call to perform hugetlb vmemmap optimization today. > > To fix this issue, simply set the hugetlb flag before that call. > > > > There was a similar issue in the free hugetlb path that was previously > > addressed. The two routines that optimize or restore hugetlb vmemmap > > should only be passed hugetlb folios/pages. To catch any callers not > > following this rule, add VM_WARN_ON calls to the routines. In the > > hugetlb free code paths, some calls could be made to restore vmemmap > > after clearing the hugetlb flag. This was 'safe' as in these cases > > vmemmap was already present and the call was a NOOP. However, for > > consistency these calls where eliminated so that we can add the > > VM_WARN_ON checks. > > > > Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when > memory_failure() is called on a free hugetlb page with vmemmap optimization > disabled (the warning is not triggered if vmemmap optimization is enabled). > I think that we need check folio_test_hugetlb() before dissolve_free_huge_page() > calls hugetlb_vmemmap_restore_folio(). > > Could you consider adding some diff like below? Thanks! That case was indeed overlooked. Andrew, this patch is currently in mm-stable. How would you like to update? - A new version of the patch - An patch to the original patch - Something else
On Fri, 13 Oct 2023 14:43:56 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote: > > Could you consider adding some diff like below? > > Thanks! That case was indeed overlooked. > > Andrew, this patch is currently in mm-stable. How would you like to update? > - A new version of the patch > - An patch to the original patch > - Something else I guess just a fixup against what's currently in mm-stable, please. It happens sometimes. Please let's get the Fixes: accurate. I just had to rebase mm-stable to drop Huang Ying's pcp auto-tuning series. I'm now seeing d8f5f7e445f0 hugetlb: set hugetlb page flag before optimizing vmemmap which I think is what it used to be, so the rebasing shouldn't affect this patch's hash.
On 10/13/23 21:58, Naoya Horiguchi wrote: > On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote: > > > > Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when > memory_failure() is called on a free hugetlb page with vmemmap optimization > disabled (the warning is not triggered if vmemmap optimization is enabled). > I think that we need check folio_test_hugetlb() before dissolve_free_huge_page() > calls hugetlb_vmemmap_restore_folio(). > > Could you consider adding some diff like below? > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2312,15 +2312,16 @@ int dissolve_free_huge_page(struct page *page) > * Attempt to allocate vmemmmap here so that we can take > * appropriate action on failure. > */ > - rc = hugetlb_vmemmap_restore_folio(h, folio); > - if (!rc) { > - update_and_free_hugetlb_folio(h, folio, false); > - } else { > - spin_lock_irq(&hugetlb_lock); > - add_hugetlb_folio(h, folio, false); > - h->max_huge_pages++; > - spin_unlock_irq(&hugetlb_lock); > + if (folio_test_hugetlb(folio)) { > + rc = hugetlb_vmemmap_restore_folio(h, folio); > + if (rc) { > + spin_lock_irq(&hugetlb_lock); > + add_hugetlb_folio(h, folio, false); > + h->max_huge_pages++; > + goto out; > + } > } > + update_and_free_hugetlb_folio(h, folio, false); > > return rc; > } > Hi Naoya, I believe we need to set 'rc = 0' in the !folio_test_hugetlb(). I put together the following patch based on mm-stable. Please take a look. From f19fbfab324d7d17de4a1e814f95ee655950c58e Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz@oracle.com> Date: Mon, 16 Oct 2023 19:55:49 -0700 Subject: [PATCH] hugetlb: check for hugetlb folio before vmemmap_restore In commit d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap") checks were added to print a warning if hugetlb_vmemmap_restore was called on a non-hugetlb page. This was mostly due to ordering issues in the hugetlb page set up and tear down sequencees. One place missed was the routine dissolve_free_huge_page. Naoya Horiguchi noted: "I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when memory_failure() is called on a free hugetlb page with vmemmap optimization disabled (the warning is not triggered if vmemmap optimization is enabled). I think that we need check folio_test_hugetlb() before dissolve_free_huge_page() calls hugetlb_vmemmap_restore_folio()." Perform the check as suggested by Naoya. Fixes: d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap") Suggested-by: Naoya Horiguchi <naoya.horiguchi@linux.dev> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 36b40bc9ac25..13736cbb2c19 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2290,17 +2290,23 @@ int dissolve_free_huge_page(struct page *page) * need to adjust max_huge_pages if the page is not freed. * Attempt to allocate vmemmmap here so that we can take * appropriate action on failure. + * + * The folio_test_hugetlb check here is because + * remove_hugetlb_folio will clear hugetlb folio flag for + * non-vmemmap optimized hugetlb folios. */ - rc = hugetlb_vmemmap_restore(h, &folio->page); - if (!rc) { - update_and_free_hugetlb_folio(h, folio, false); - } else { - spin_lock_irq(&hugetlb_lock); - add_hugetlb_folio(h, folio, false); - h->max_huge_pages++; - spin_unlock_irq(&hugetlb_lock); - } + if (folio_test_hugetlb(folio)) { + rc = hugetlb_vmemmap_restore(h, &folio->page); + if (rc) { + spin_lock_irq(&hugetlb_lock); + add_hugetlb_folio(h, folio, false); + h->max_huge_pages++; + goto out; + } + } else + rc = 0; + update_and_free_hugetlb_folio(h, folio, false); return rc; } out:
On Mon, Oct 16, 2023 at 08:21:40PM -0700, Mike Kravetz wrote: > On 10/13/23 21:58, Naoya Horiguchi wrote: > > On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote: > > > > > > Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") > > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > > > I saw that VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when > > memory_failure() is called on a free hugetlb page with vmemmap optimization > > disabled (the warning is not triggered if vmemmap optimization is enabled). > > I think that we need check folio_test_hugetlb() before dissolve_free_huge_page() > > calls hugetlb_vmemmap_restore_folio(). > > > > Could you consider adding some diff like below? > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -2312,15 +2312,16 @@ int dissolve_free_huge_page(struct page *page) > > * Attempt to allocate vmemmmap here so that we can take > > * appropriate action on failure. > > */ > > - rc = hugetlb_vmemmap_restore_folio(h, folio); > > - if (!rc) { > > - update_and_free_hugetlb_folio(h, folio, false); > > - } else { > > - spin_lock_irq(&hugetlb_lock); > > - add_hugetlb_folio(h, folio, false); > > - h->max_huge_pages++; > > - spin_unlock_irq(&hugetlb_lock); > > + if (folio_test_hugetlb(folio)) { > > + rc = hugetlb_vmemmap_restore_folio(h, folio); > > + if (rc) { > > + spin_lock_irq(&hugetlb_lock); > > + add_hugetlb_folio(h, folio, false); > > + h->max_huge_pages++; > > + goto out; > > + } > > } > > + update_and_free_hugetlb_folio(h, folio, false); > > > > return rc; > > } > > > > Hi Naoya, > > I believe we need to set 'rc = 0' in the !folio_test_hugetlb(). I put > together the following patch based on mm-stable. Please take a look. Hi Mike, the patch looks good to me, and I confirmed that my test programs showed no new problem on latest mm-stable + this patch. Thank you very much. Naoya Horiguchi > > From f19fbfab324d7d17de4a1e814f95ee655950c58e Mon Sep 17 00:00:00 2001 > From: Mike Kravetz <mike.kravetz@oracle.com> > Date: Mon, 16 Oct 2023 19:55:49 -0700 > Subject: [PATCH] hugetlb: check for hugetlb folio before vmemmap_restore > > In commit d8f5f7e445f0 ("hugetlb: set hugetlb page flag before > optimizing vmemmap") checks were added to print a warning if > hugetlb_vmemmap_restore was called on a non-hugetlb page. This > was mostly due to ordering issues in the hugetlb page set up and > tear down sequencees. One place missed was the routine > dissolve_free_huge_page. Naoya Horiguchi noted: "I saw that > VM_WARN_ON_ONCE() in hugetlb_vmemmap_restore is triggered when > memory_failure() is called on a free hugetlb page with vmemmap > optimization disabled (the warning is not triggered if vmemmap > optimization is enabled). I think that we need check > folio_test_hugetlb() before dissolve_free_huge_page() calls > hugetlb_vmemmap_restore_folio()." > > Perform the check as suggested by Naoya. > > Fixes: d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap") > Suggested-by: Naoya Horiguchi <naoya.horiguchi@linux.dev> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/hugetlb.c | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 36b40bc9ac25..13736cbb2c19 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2290,17 +2290,23 @@ int dissolve_free_huge_page(struct page *page) > * need to adjust max_huge_pages if the page is not freed. > * Attempt to allocate vmemmmap here so that we can take > * appropriate action on failure. > + * > + * The folio_test_hugetlb check here is because > + * remove_hugetlb_folio will clear hugetlb folio flag for > + * non-vmemmap optimized hugetlb folios. > */ > - rc = hugetlb_vmemmap_restore(h, &folio->page); > - if (!rc) { > - update_and_free_hugetlb_folio(h, folio, false); > - } else { > - spin_lock_irq(&hugetlb_lock); > - add_hugetlb_folio(h, folio, false); > - h->max_huge_pages++; > - spin_unlock_irq(&hugetlb_lock); > - } > + if (folio_test_hugetlb(folio)) { > + rc = hugetlb_vmemmap_restore(h, &folio->page); > + if (rc) { > + spin_lock_irq(&hugetlb_lock); > + add_hugetlb_folio(h, folio, false); > + h->max_huge_pages++; > + goto out; > + } > + } else > + rc = 0; > > + update_and_free_hugetlb_folio(h, folio, false); > return rc; > } > out: > -- > 2.41.0 >
On 10/18/23 10:58, Naoya Horiguchi wrote: > On Mon, Oct 16, 2023 at 08:21:40PM -0700, Mike Kravetz wrote: > > On 10/13/23 21:58, Naoya Horiguchi wrote: > > > On Tue, Sep 05, 2023 at 02:44:00PM -0700, Mike Kravetz wrote: > > Hi Naoya, > > > > I believe we need to set 'rc = 0' in the !folio_test_hugetlb(). I put > > together the following patch based on mm-stable. Please take a look. > > Hi Mike, the patch looks good to me, and I confirmed that my test programs > showed no new problem on latest mm-stable + this patch. > Thank you for testing! My patch could be simpler/more elegant by eliminating that else clause and just returning 0 after update_and_free_hugetlb_folio. Oh well, it is functionally the same.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ba6d39b71cb1..c32ca241df4b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1720,7 +1720,12 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, if (folio_test_hugetlb_raw_hwp_unreliable(folio)) return; - if (hugetlb_vmemmap_restore(h, &folio->page)) { + /* + * If folio is not vmemmap optimized (!clear_dtor), then the folio + * is no longer identified as a hugetlb page. hugetlb_vmemmap_restore + * can only be passed hugetlb pages and will BUG otherwise. + */ + if (clear_dtor && hugetlb_vmemmap_restore(h, &folio->page)) { spin_lock_irq(&hugetlb_lock); /* * If we cannot allocate vmemmap pages, just refuse to free the @@ -1930,9 +1935,9 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid) static void __prep_new_hugetlb_folio(struct hstate *h, struct folio *folio) { + folio_set_hugetlb(folio); hugetlb_vmemmap_optimize(h, &folio->page); INIT_LIST_HEAD(&folio->lru); - folio_set_hugetlb(folio); hugetlb_set_folio_subpool(folio, NULL); set_hugetlb_cgroup(folio, NULL); set_hugetlb_cgroup_rsvd(folio, NULL); @@ -3580,13 +3585,21 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio) remove_hugetlb_folio_for_demote(h, folio, false); spin_unlock_irq(&hugetlb_lock); - rc = hugetlb_vmemmap_restore(h, &folio->page); - if (rc) { - /* Allocation of vmemmmap failed, we can not demote folio */ - spin_lock_irq(&hugetlb_lock); - folio_ref_unfreeze(folio, 1); - add_hugetlb_folio(h, folio, false); - return rc; + /* + * If vmemmap already existed for folio, the remove routine above would + * have cleared the hugetlb folio flag. Hence the folio is technically + * no longer a hugetlb folio. hugetlb_vmemmap_restore can only be + * passed hugetlb folios and will BUG otherwise. + */ + if (folio_test_hugetlb(folio)) { + rc = hugetlb_vmemmap_restore(h, &folio->page); + if (rc) { + /* Allocation of vmemmmap failed, we can not demote folio */ + spin_lock_irq(&hugetlb_lock); + folio_ref_unfreeze(folio, 1); + add_hugetlb_folio(h, folio, false); + return rc; + } } /* diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 4b9734777f69..aeb7dd889eee 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -13,6 +13,7 @@ #include <linux/pgtable.h> #include <linux/moduleparam.h> #include <linux/bootmem_info.h> +#include <linux/mmdebug.h> #include <asm/pgalloc.h> #include <asm/tlbflush.h> #include "hugetlb_vmemmap.h" @@ -456,6 +457,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head) unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; unsigned long vmemmap_reuse; + VM_WARN_ON_ONCE(!PageHuge(head)); if (!HPageVmemmapOptimized(head)) return 0; @@ -550,6 +552,7 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; unsigned long vmemmap_reuse; + VM_WARN_ON_ONCE(!PageHuge(head)); if (!vmemmap_should_optimize(h, head)) return;
Currently, vmemmap optimization of hugetlb pages is performed before the hugetlb flag (previously hugetlb destructor) is set identifying it as a hugetlb folio. This means there is a window of time where an ordinary folio does not have all associated vmemmap present. The core mm only expects vmemmap to be potentially optimized for hugetlb and device dax. This can cause problems in code such as memory error handling that may want to write to tail struct pages. There is only one call to perform hugetlb vmemmap optimization today. To fix this issue, simply set the hugetlb flag before that call. There was a similar issue in the free hugetlb path that was previously addressed. The two routines that optimize or restore hugetlb vmemmap should only be passed hugetlb folios/pages. To catch any callers not following this rule, add VM_WARN_ON calls to the routines. In the hugetlb free code paths, some calls could be made to restore vmemmap after clearing the hugetlb flag. This was 'safe' as in these cases vmemmap was already present and the call was a NOOP. However, for consistency these calls where eliminated so that we can add the VM_WARN_ON checks. Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 31 ++++++++++++++++++++++--------- mm/hugetlb_vmemmap.c | 3 +++ 2 files changed, 25 insertions(+), 9 deletions(-)