Message ID | 20240404162515.527802-2-fvdl@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] mm/cma: drop incorrect alignment check in cma_init_reserved_mem | expand |
On Thu, Apr 04, 2024 at 04:25:15PM +0000, Frank van der Linden wrote: > The hugetlb_cma code passes 0 in the order_per_bit argument to > cma_declare_contiguous_nid (the alignment, computed using the > page order, is correctly passed in). > > This causes a bit in the cma allocation bitmap to always represent > a 4k page, making the bitmaps potentially very large, and slower. > > So, correctly pass in the order instead. > > Signed-off-by: Frank van der Linden <fvdl@google.com> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Hi Frank, there is a comment just above your changes which explains why order_per_bit is 0. Is this not true anymore? If so, please, fix the comment too. Please, clarify. Thanks!
On Thu, Apr 4, 2024 at 11:56 AM Roman Gushchin <roman.gushchin@linux.dev> wrote: > > On Thu, Apr 04, 2024 at 04:25:15PM +0000, Frank van der Linden wrote: > > The hugetlb_cma code passes 0 in the order_per_bit argument to > > cma_declare_contiguous_nid (the alignment, computed using the > > page order, is correctly passed in). > > > > This causes a bit in the cma allocation bitmap to always represent > > a 4k page, making the bitmaps potentially very large, and slower. > > > > So, correctly pass in the order instead. > > > > Signed-off-by: Frank van der Linden <fvdl@google.com> > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > > Hi Frank, > > there is a comment just above your changes which explains why order_per_bit is 0. > Is this not true anymore? If so, please, fix the comment too. Please, clarify. > > Thanks! Hi Roman, I'm assuming you're referring to this comment: /* * Note that 'order per bit' is based on smallest size that * may be returned to CMA allocator in the case of * huge page demotion. */ That comment was added in a01f43901cfb9 ("hugetlb: be sure to free demoted CMA pages to CMA"). It talks about HUGETLB_PAGE_ORDER being the minimum order being given back to the CMA allocator (after hugetlb demotion), therefore order_per_bit must be HUGETLB_PAGE_ORDER. See the commit message for that commit: "Therefore, at region setup time we use HUGETLB_PAGE_ORDER as the smallest possible huge page size that can be given back to CMA." But the commit, while correctly changing the alignment, left the order_per_bit argument at 0, even though it clearly intended to set it at HUGETLB_PAGE_ORDER. The confusion may have been that cma_declare_contiguous_nid has 9 arguments, several of which can be left at 0 meaning 'use default', so it's easy to misread. In other words, the comment was correct, but the code was not. After this patch, comment and code match.
On 04.04.24 18:25, Frank van der Linden wrote: > The hugetlb_cma code passes 0 in the order_per_bit argument to > cma_declare_contiguous_nid (the alignment, computed using the > page order, is correctly passed in). > > This causes a bit in the cma allocation bitmap to always represent > a 4k page, making the bitmaps potentially very large, and slower. > > So, correctly pass in the order instead. > > Signed-off-by: Frank van der Linden <fvdl@google.com> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") It might be subopimal, but do we call it a "BUG" that needs "fixing". I know, controversial :) > --- > mm/hugetlb.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 23ef240ba48a..6dc62d8b2a3a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -7873,9 +7873,9 @@ void __init hugetlb_cma_reserve(int order) > * huge page demotion. > */ > res = cma_declare_contiguous_nid(0, size, 0, > - PAGE_SIZE << HUGETLB_PAGE_ORDER, > - 0, false, name, > - &hugetlb_cma[nid], nid); > + PAGE_SIZE << HUGETLB_PAGE_ORDER, > + HUGETLB_PAGE_ORDER, false, name, > + &hugetlb_cma[nid], nid); > if (res) { > pr_warn("hugetlb_cma: reservation failed: err %d, node %d", > res, nid); ... I'm afraid this is not completely correct. For example, on arm64, HUGETLB_PAGE_ORDER is essentially PMD_ORDER. ... but we do support smaller hugetlb sizes than that (cont-pte hugetlb size is 64 KiB, not 2 MiB -- PMD -- on a 4k kernel)
On Thu, 4 Apr 2024 16:25:15 +0000 Frank van der Linden <fvdl@google.com> wrote: > The hugetlb_cma code passes 0 in the order_per_bit argument to > cma_declare_contiguous_nid (the alignment, computed using the > page order, is correctly passed in). > > This causes a bit in the cma allocation bitmap to always represent > a 4k page, making the bitmaps potentially very large, and slower. > > So, correctly pass in the order instead. Ditto. Should we backport this? Can we somewhat quantify "potentially very", and understand under what circumstances this might occur?
On Thu, Apr 04, 2024 at 12:40:58PM -0700, Frank van der Linden wrote: > On Thu, Apr 4, 2024 at 11:56 AM Roman Gushchin <roman.gushchin@linux.dev> wrote: > > > > On Thu, Apr 04, 2024 at 04:25:15PM +0000, Frank van der Linden wrote: > > > The hugetlb_cma code passes 0 in the order_per_bit argument to > > > cma_declare_contiguous_nid (the alignment, computed using the > > > page order, is correctly passed in). > > > > > > This causes a bit in the cma allocation bitmap to always represent > > > a 4k page, making the bitmaps potentially very large, and slower. > > > > > > So, correctly pass in the order instead. > > > > > > Signed-off-by: Frank van der Linden <fvdl@google.com> > > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > > > > Hi Frank, > > > > there is a comment just above your changes which explains why order_per_bit is 0. > > Is this not true anymore? If so, please, fix the comment too. Please, clarify. > > > > Thanks! > > Hi Roman, > > I'm assuming you're referring to this comment: > > /* > * Note that 'order per bit' is based on smallest size that > * may be returned to CMA allocator in the case of > * huge page demotion. > */ > > That comment was added in a01f43901cfb9 ("hugetlb: be sure to free > demoted CMA pages to CMA"). > > It talks about HUGETLB_PAGE_ORDER being the minimum order being given > back to the CMA allocator (after hugetlb demotion), therefore > order_per_bit must be HUGETLB_PAGE_ORDER. See the commit message for > that commit: > > "Therefore, at region setup time we use HUGETLB_PAGE_ORDER as the > smallest possible huge page size that can be given back to CMA." > > But the commit, while correctly changing the alignment, left the > order_per_bit argument at 0, even though it clearly intended to set > it at HUGETLB_PAGE_ORDER. The confusion may have been that > cma_declare_contiguous_nid has 9 arguments, several of which can be > left at 0 meaning 'use default', so it's easy to misread. > > In other words, the comment was correct, but the code was not. After > this patch, comment and code match. Indeed the mentioned commit which added a comment which was not aligned with the code was confusing. It all makes sense now, thank you for the explanation! Please, feel free to add Acked-by: Roman Gushchin <roman.gushchin@linux.dev> for your patch. Thanks!
On Thu, Apr 04, 2024 at 10:13:21PM +0200, David Hildenbrand wrote: > On 04.04.24 18:25, Frank van der Linden wrote: > > The hugetlb_cma code passes 0 in the order_per_bit argument to > > cma_declare_contiguous_nid (the alignment, computed using the > > page order, is correctly passed in). > > > > This causes a bit in the cma allocation bitmap to always represent > > a 4k page, making the bitmaps potentially very large, and slower. > > > > So, correctly pass in the order instead. > > > > Signed-off-by: Frank van der Linden <fvdl@google.com> > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > > It might be subopimal, but do we call it a "BUG" that needs "fixing". I > know, controversial :) We probably should not rush with a stable backporting, especially given your next comment on page sizes on arm.
On Thu, Apr 4, 2024 at 1:13 PM David Hildenbrand <david@redhat.com> wrote: > > On 04.04.24 18:25, Frank van der Linden wrote: > > The hugetlb_cma code passes 0 in the order_per_bit argument to > > cma_declare_contiguous_nid (the alignment, computed using the > > page order, is correctly passed in). > > > > This causes a bit in the cma allocation bitmap to always represent > > a 4k page, making the bitmaps potentially very large, and slower. > > > > So, correctly pass in the order instead. > > > > Signed-off-by: Frank van der Linden <fvdl@google.com> > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > > It might be subopimal, but do we call it a "BUG" that needs "fixing". I > know, controversial :) > > > --- > > mm/hugetlb.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 23ef240ba48a..6dc62d8b2a3a 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -7873,9 +7873,9 @@ void __init hugetlb_cma_reserve(int order) > > * huge page demotion. > > */ > > res = cma_declare_contiguous_nid(0, size, 0, > > - PAGE_SIZE << HUGETLB_PAGE_ORDER, > > - 0, false, name, > > - &hugetlb_cma[nid], nid); > > + PAGE_SIZE << HUGETLB_PAGE_ORDER, > > + HUGETLB_PAGE_ORDER, false, name, > > + &hugetlb_cma[nid], nid); > > if (res) { > > pr_warn("hugetlb_cma: reservation failed: err %d, node %d", > > res, nid); > > ... I'm afraid this is not completely correct. > > For example, on arm64, HUGETLB_PAGE_ORDER is essentially PMD_ORDER. > > ... but we do support smaller hugetlb sizes than that (cont-pte hugetlb > size is 64 KiB, not 2 MiB -- PMD -- on a 4k kernel) Right, smaller hugetlb page sizes exist. But, the value here is not intended to represent the minimum hugetlb page size - it's the minimum hugetlb page size that we can demote a CMA-allocated hugetlb page to. See: a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA") So, this just restricts demotion of the gigantic, CMA-allocated pages to HUGETLB_PAGE_ORDER-sized chunks. - Frank
On Thu, Apr 4, 2024 at 1:17 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 4 Apr 2024 16:25:15 +0000 Frank van der Linden <fvdl@google.com> wrote: > > > The hugetlb_cma code passes 0 in the order_per_bit argument to > > cma_declare_contiguous_nid (the alignment, computed using the > > page order, is correctly passed in). > > > > This causes a bit in the cma allocation bitmap to always represent > > a 4k page, making the bitmaps potentially very large, and slower. > > > > So, correctly pass in the order instead. > > Ditto. Should we backport this? Can we somewhat quantify "potentially very", > and understand under what circumstances this might occur? It would create bitmaps that would be pretty big. E.g. for a 4k page size on x86, hugetlb_cma=64G would mean a bitmap size of (64G / 4k) / 8 == 2M. With HUGETLB_PAGE_ORDER as order_per_bit, as intended, this would be (64G / 2M) / 8 == 4k. So, that's quite a difference :) Also, this restricted the hugetlb_cma area to ((PAGE_SIZE << MAX_PAGE_ORDER) * 8) * PAGE_SIZE (e.g. 128G on x86) , since bitmap_alloc uses normal page allocation, and is thus restricted by MAX_PAGE_ORDER. Specifying anything about that would fail the CMA initialization. - Frank
Rushing is never good, of course, but see my reply to David - while smaller hugetlb page sizes than HUGETLB_PAGE_ORDER exist, that's not the issue in that particular code path. The only restriction for backports is, I think, that the two patches need to go together. I have backported them to 6.6 (which was just a clean apply), and 5.10, which doesn't have hugetlb page demotion, so it actually can pass the full 1G as order_per_bit. That works fine if you also apply the CMA align check fix, but would fail otherwise. - Frank On Thu, Apr 4, 2024 at 1:52 PM Roman Gushchin <roman.gushchin@linux.dev> wrote: > > On Thu, Apr 04, 2024 at 10:13:21PM +0200, David Hildenbrand wrote: > > On 04.04.24 18:25, Frank van der Linden wrote: > > > The hugetlb_cma code passes 0 in the order_per_bit argument to > > > cma_declare_contiguous_nid (the alignment, computed using the > > > page order, is correctly passed in). > > > > > > This causes a bit in the cma allocation bitmap to always represent > > > a 4k page, making the bitmaps potentially very large, and slower. > > > > > > So, correctly pass in the order instead. > > > > > > Signed-off-by: Frank van der Linden <fvdl@google.com> > > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > > > > It might be subopimal, but do we call it a "BUG" that needs "fixing". I > > know, controversial :) > > We probably should not rush with a stable backporting, especially given your > next comment on page sizes on arm.
On Thu, 4 Apr 2024 15:02:34 -0700 Frank van der Linden <fvdl@google.com> wrote: > Rushing is never good, of course, but see my reply to David - while > smaller hugetlb page sizes than HUGETLB_PAGE_ORDER exist, that's not > the issue in that particular code path. > > The only restriction for backports is, I think, that the two patches > need to go together. > > I have backported them to 6.6 (which was just a clean apply), and > 5.10, which doesn't have hugetlb page demotion, so it actually can > pass the full 1G as order_per_bit. That works fine if you also apply > the CMA align check fix, but would fail otherwise. OK, thanks. I added cc:stable to both patches and added this: : It would create bitmaps that would be pretty big. E.g. for a 4k page : size on x86, hugetlb_cma=64G would mean a bitmap size of (64G / 4k) / 8 : == 2M. With HUGETLB_PAGE_ORDER as order_per_bit, as intended, this : would be (64G / 2M) / 8 == 4k. So, that's quite a difference. : : Also, this restricted the hugetlb_cma area to ((PAGE_SIZE << : MAX_PAGE_ORDER) * 8) * PAGE_SIZE (e.g. 128G on x86) , since : bitmap_alloc uses normal page allocation, and is thus restricted by : MAX_PAGE_ORDER. Specifying anything about that would fail the CMA : initialization. to the [2/2] changelog. For extra test & review I'll leave them in mm-[un]stable so they go into mainline for 6.10-rc1 which will then trigger the backporting process. This can of course all be altered...
On Thu, Apr 4, 2024 at 2:44 PM Frank van der Linden <fvdl@google.com> wrote: > > On Thu, Apr 4, 2024 at 1:13 PM David Hildenbrand <david@redhat.com> wrote: > > > > On 04.04.24 18:25, Frank van der Linden wrote: > > > The hugetlb_cma code passes 0 in the order_per_bit argument to > > > cma_declare_contiguous_nid (the alignment, computed using the > > > page order, is correctly passed in). > > > > > > This causes a bit in the cma allocation bitmap to always represent > > > a 4k page, making the bitmaps potentially very large, and slower. > > > > > > So, correctly pass in the order instead. > > > > > > Signed-off-by: Frank van der Linden <fvdl@google.com> > > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > > > > It might be subopimal, but do we call it a "BUG" that needs "fixing". I > > know, controversial :) > > > > > --- > > > mm/hugetlb.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 23ef240ba48a..6dc62d8b2a3a 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -7873,9 +7873,9 @@ void __init hugetlb_cma_reserve(int order) > > > * huge page demotion. > > > */ > > > res = cma_declare_contiguous_nid(0, size, 0, > > > - PAGE_SIZE << HUGETLB_PAGE_ORDER, > > > - 0, false, name, > > > - &hugetlb_cma[nid], nid); > > > + PAGE_SIZE << HUGETLB_PAGE_ORDER, > > > + HUGETLB_PAGE_ORDER, false, name, > > > + &hugetlb_cma[nid], nid); > > > if (res) { > > > pr_warn("hugetlb_cma: reservation failed: err %d, node %d", > > > res, nid); > > > > ... I'm afraid this is not completely correct. > > > > For example, on arm64, HUGETLB_PAGE_ORDER is essentially PMD_ORDER. > > > > ... but we do support smaller hugetlb sizes than that (cont-pte hugetlb > > size is 64 KiB, not 2 MiB -- PMD -- on a 4k kernel) > > Right, smaller hugetlb page sizes exist. But, the value here is not > intended to represent the minimum hugetlb page size - it's the minimum > hugetlb page size that we can demote a CMA-allocated hugetlb page to. > See: > > a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA") > > So, this just restricts demotion of the gigantic, CMA-allocated pages > to HUGETLB_PAGE_ORDER-sized chunks. > > - Frank Just to clarify what I'm saying here: the restriction of the size you can demote a CMA-allocated gigantic page to was already there, my patch doesn't change anything in that regard.
> On Apr 5, 2024, at 00:25, Frank van der Linden <fvdl@google.com> wrote: > > The hugetlb_cma code passes 0 in the order_per_bit argument to > cma_declare_contiguous_nid (the alignment, computed using the > page order, is correctly passed in). > > This causes a bit in the cma allocation bitmap to always represent > a 4k page, making the bitmaps potentially very large, and slower. > > So, correctly pass in the order instead. > > Signed-off-by: Frank van der Linden <fvdl@google.com> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > --- > mm/hugetlb.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 23ef240ba48a..6dc62d8b2a3a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -7873,9 +7873,9 @@ void __init hugetlb_cma_reserve(int order) > * huge page demotion. > */ > res = cma_declare_contiguous_nid(0, size, 0, > - PAGE_SIZE << HUGETLB_PAGE_ORDER, > - 0, false, name, > - &hugetlb_cma[nid], nid); > + PAGE_SIZE << HUGETLB_PAGE_ORDER, > + HUGETLB_PAGE_ORDER, false, name, IIUC, we could make the optimization further to change order_per_bit to 'MAX_PAGE_ORDER + 1' since only gigantic hugetlb pages could allocated from the CMA pool meaning any gigantic page is greater than or equal to the size of two to the power of 'MAX_PAGE_ORDER + 1'. Thanks. > + &hugetlb_cma[nid], nid); > if (res) { > pr_warn("hugetlb_cma: reservation failed: err %d, node %d", > res, nid); > -- > 2.44.0.478.gd926399ef9-goog >
On 04.04.24 23:44, Frank van der Linden wrote: > On Thu, Apr 4, 2024 at 1:13 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 04.04.24 18:25, Frank van der Linden wrote: >>> The hugetlb_cma code passes 0 in the order_per_bit argument to >>> cma_declare_contiguous_nid (the alignment, computed using the >>> page order, is correctly passed in). >>> >>> This causes a bit in the cma allocation bitmap to always represent >>> a 4k page, making the bitmaps potentially very large, and slower. >>> >>> So, correctly pass in the order instead. >>> >>> Signed-off-by: Frank van der Linden <fvdl@google.com> >>> Cc: Roman Gushchin <roman.gushchin@linux.dev> >>> Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") >> >> It might be subopimal, but do we call it a "BUG" that needs "fixing". I >> know, controversial :) >> >>> --- >>> mm/hugetlb.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index 23ef240ba48a..6dc62d8b2a3a 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -7873,9 +7873,9 @@ void __init hugetlb_cma_reserve(int order) >>> * huge page demotion. >>> */ >>> res = cma_declare_contiguous_nid(0, size, 0, >>> - PAGE_SIZE << HUGETLB_PAGE_ORDER, >>> - 0, false, name, >>> - &hugetlb_cma[nid], nid); >>> + PAGE_SIZE << HUGETLB_PAGE_ORDER, >>> + HUGETLB_PAGE_ORDER, false, name, >>> + &hugetlb_cma[nid], nid); >>> if (res) { >>> pr_warn("hugetlb_cma: reservation failed: err %d, node %d", >>> res, nid); >> >> ... I'm afraid this is not completely correct. >> >> For example, on arm64, HUGETLB_PAGE_ORDER is essentially PMD_ORDER. >> >> ... but we do support smaller hugetlb sizes than that (cont-pte hugetlb >> size is 64 KiB, not 2 MiB -- PMD -- on a 4k kernel) > > Right, smaller hugetlb page sizes exist. But, the value here is not > intended to represent the minimum hugetlb page size - it's the minimum > hugetlb page size that we can demote a CMA-allocated hugetlb page to. > See: > > a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA") > > So, this just restricts demotion of the gigantic, CMA-allocated pages > to HUGETLB_PAGE_ORDER-sized chunks. Right, now my memory comes back. In v1 of that patch set, Mike did support denoting to any smaller hugetlb order. And because that smallest hugetlb order is not known when we call cma_declare_contiguous_nid(), he used to pass "0" for the order. In v2, he simplifed that, though, and limited it to HUGETLB_PAGE_ORDER. He didn't update the order here, though. Acked-by: David Hildenbrand <david@redhat.com> Note that I don't think any of these patches are CC-stable material.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 23ef240ba48a..6dc62d8b2a3a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7873,9 +7873,9 @@ void __init hugetlb_cma_reserve(int order) * huge page demotion. */ res = cma_declare_contiguous_nid(0, size, 0, - PAGE_SIZE << HUGETLB_PAGE_ORDER, - 0, false, name, - &hugetlb_cma[nid], nid); + PAGE_SIZE << HUGETLB_PAGE_ORDER, + HUGETLB_PAGE_ORDER, false, name, + &hugetlb_cma[nid], nid); if (res) { pr_warn("hugetlb_cma: reservation failed: err %d, node %d", res, nid);
The hugetlb_cma code passes 0 in the order_per_bit argument to cma_declare_contiguous_nid (the alignment, computed using the page order, is correctly passed in). This causes a bit in the cma allocation bitmap to always represent a 4k page, making the bitmaps potentially very large, and slower. So, correctly pass in the order instead. Signed-off-by: Frank van der Linden <fvdl@google.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") --- mm/hugetlb.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)