Message ID | 20200830140418.605627-1-lixinhai.lxh@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hugetlb: try preferred node first when alloc gigantic page from cma | expand |
On 8/30/20 7:04 AM, Li Xinhai wrote: > Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic > hugepages using cma"), the gigantic page would be allocated from node > which is not the preferred node, although there are pages available from > that node. The reason is that the nid parameter has been ignored in > alloc_gigantic_page(). > > After this patch, the preferred node is tried first before other allowed > nodes. Thank you! This is an issue that needs to be fixed. > Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > Cc: Roman Gushchin <guro@fb.com> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Michal Hocko <mhocko@kernel.org> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> > --- > mm/hugetlb.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index a301c2d672bf..4a28b8853d47 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1256,8 +1256,15 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, > struct page *page; > int node; > > + if (hugetlb_cma[nid]) { > + page = cma_alloc(hugetlb_cma[nid], nr_pages, > + huge_page_order(h), true); > + if (page) > + return page; > + } > + When looking at your changes, I noticed that this code for allocation from CMA does not take gfp_mask into account. The 'normal' use case is to allocate pool pages with something similar to: echo 16 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages The routine alloc_pool_huge_page will try to interleave pages among nodes: ... gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { ... which will eventually call alloc_gigantic_page. If __GFP_THISNODE is set we really do not want to execute the below for loop in alloc_gigantic_page. I think the convention in the mm code is that only the lowest level allocation routines should interpret the GFP flags. We may need to make an exception here and check for __GFP_THISNODE. Michal would be the best person to comment and perhaps make a recommendation.
On Mon 31-08-20 14:44:40, Mike Kravetz wrote: > On 8/30/20 7:04 AM, Li Xinhai wrote: > > Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic > > hugepages using cma"), the gigantic page would be allocated from node > > which is not the preferred node, although there are pages available from > > that node. The reason is that the nid parameter has been ignored in > > alloc_gigantic_page(). > > > > After this patch, the preferred node is tried first before other allowed > > nodes. > > Thank you! > This is an issue that needs to be fixed. > > > Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > > Cc: Roman Gushchin <guro@fb.com> > > Cc: Mike Kravetz <mike.kravetz@oracle.com> > > Cc: Michal Hocko <mhocko@kernel.org> > > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> > > --- > > mm/hugetlb.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index a301c2d672bf..4a28b8853d47 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1256,8 +1256,15 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, > > struct page *page; > > int node; > > > > + if (hugetlb_cma[nid]) { > > + page = cma_alloc(hugetlb_cma[nid], nr_pages, > > + huge_page_order(h), true); > > + if (page) > > + return page; > > + } > > + > > When looking at your changes, I noticed that this code for allocation > from CMA does not take gfp_mask into account. The 'normal' use case > is to allocate pool pages with something similar to: > > echo 16 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > The routine alloc_pool_huge_page will try to interleave pages among nodes: > > ... > gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; > > for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { > ... > > which will eventually call alloc_gigantic_page. If __GFP_THISNODE is > set we really do not want to execute the below for loop in alloc_gigantic_page. Yes, this is the case indeed. > I think the convention in the mm code is that only the lowest level > allocation routines should interpret the GFP flags. We may need to make > an exception here and check for __GFP_THISNODE. Yes this is true, But alloc_gigantic_page is actually low level allocation routine in fact. I would go with the following diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a301c2d672bf..124754240b56 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1256,6 +1256,16 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, struct page *page; int node; + if (nid != NUMA_NO_NODE && hugetlb_cma[nid]) { + page = cma_alloc(hugetlb_cma[nid], nr_pages, + huge_page_order(h), true); + if (page) + return page; + } + + if (gfp_mask & __GFP_THISNODE) + return NULL; + for_each_node_mask(node, *nodemask) { if (!hugetlb_cma[node]) continue; I do not think we actually do have an explicit NUMA_NO_NODE user but it is safer to not asume that here.
On 2020-09-01 at 21:41 Michal Hocko wrote: >On Mon 31-08-20 14:44:40, Mike Kravetz wrote: >> On 8/30/20 7:04 AM, Li Xinhai wrote: >> > Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic >> > hugepages using cma"), the gigantic page would be allocated from node >> > which is not the preferred node, although there are pages available from >> > that node. The reason is that the nid parameter has been ignored in >> > alloc_gigantic_page(). >> > >> > After this patch, the preferred node is tried first before other allowed >> > nodes. >> >> Thank you! >> This is an issue that needs to be fixed. >> >> > Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma") >> > Cc: Roman Gushchin <guro@fb.com> >> > Cc: Mike Kravetz <mike.kravetz@oracle.com> >> > Cc: Michal Hocko <mhocko@kernel.org> >> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >> > --- >> > mm/hugetlb.c | 9 ++++++++- >> > 1 file changed, 8 insertions(+), 1 deletion(-) >> > >> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> > index a301c2d672bf..4a28b8853d47 100644 >> > --- a/mm/hugetlb.c >> > +++ b/mm/hugetlb.c >> > @@ -1256,8 +1256,15 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, >> > struct page *page; >> > int node; >> > >> > + if (hugetlb_cma[nid]) { >> > + page = cma_alloc(hugetlb_cma[nid], nr_pages, >> > + huge_page_order(h), true); >> > + if (page) >> > + return page; >> > + } >> > + >> >> When looking at your changes, I noticed that this code for allocation >> from CMA does not take gfp_mask into account. The 'normal' use case >> is to allocate pool pages with something similar to: >> >> echo 16 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >> >> The routine alloc_pool_huge_page will try to interleave pages among nodes: >> >> ... >> gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; >> >> for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { >> ... >> >> which will eventually call alloc_gigantic_page. If __GFP_THISNODE is >> set we really do not want to execute the below for loop in alloc_gigantic_page. > >Yes, this is the case indeed. > >> I think the convention in the mm code is that only the lowest level >> allocation routines should interpret the GFP flags. We may need to make >> an exception here and check for __GFP_THISNODE. > >Yes this is true, But alloc_gigantic_page is actually low level >allocation routine in fact. > Thanks for the review, we need to consider the __GFP_THISNODE flag. >I would go with the following >diff --git a/mm/hugetlb.c b/mm/hugetlb.c >index a301c2d672bf..124754240b56 100644 >--- a/mm/hugetlb.c >+++ b/mm/hugetlb.c >@@ -1256,6 +1256,16 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, > struct page *page; > int node; > >+ if (nid != NUMA_NO_NODE && hugetlb_cma[nid]) { >+ page = cma_alloc(hugetlb_cma[nid], nr_pages, >+ huge_page_order(h), true); >+ if (page) >+ return page; >+ } >+ >+ if (gfp_mask & __GFP_THISNODE) >+ return NULL; >+ I think in case of failed to allocate on THISNODE, it still needs to call below alloc_contig_pages(), so we have one more chance to allcoate successfully on the preferred node. > for_each_node_mask(node, *nodemask) { > if (!hugetlb_cma[node]) > continue; > >I do not think we actually do have an explicit NUMA_NO_NODE user but it >is safer to not asume that here. >-- >Michal Hocko >SUSE Labs
On Tue 01-09-20 22:20:44, Li Xinhai wrote: > On 2020-09-01 at 21:41 Michal Hocko wrote: > >On Mon 31-08-20 14:44:40, Mike Kravetz wrote: > >> On 8/30/20 7:04 AM, Li Xinhai wrote: > >> > Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic > >> > hugepages using cma"), the gigantic page would be allocated from node > >> > which is not the preferred node, although there are pages available from > >> > that node. The reason is that the nid parameter has been ignored in > >> > alloc_gigantic_page(). > >> > > >> > After this patch, the preferred node is tried first before other allowed > >> > nodes. > >> > >> Thank you! > >> This is an issue that needs to be fixed. > >> > >> > Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma") > >> > Cc: Roman Gushchin <guro@fb.com> > >> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > >> > Cc: Michal Hocko <mhocko@kernel.org> > >> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> > >> > --- > >> > mm/hugetlb.c | 9 ++++++++- > >> > 1 file changed, 8 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >> > index a301c2d672bf..4a28b8853d47 100644 > >> > --- a/mm/hugetlb.c > >> > +++ b/mm/hugetlb.c > >> > @@ -1256,8 +1256,15 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, > >> > struct page *page; > >> > int node; > >> > > >> > + if (hugetlb_cma[nid]) { > >> > + page = cma_alloc(hugetlb_cma[nid], nr_pages, > >> > + huge_page_order(h), true); > >> > + if (page) > >> > + return page; > >> > + } > >> > + > >> > >> When looking at your changes, I noticed that this code for allocation > >> from CMA does not take gfp_mask into account. The 'normal' use case > >> is to allocate pool pages with something similar to: > >> > >> echo 16 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > >> > >> The routine alloc_pool_huge_page will try to interleave pages among nodes: > >> > >> ... > >> gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; > >> > >> for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { > >> ... > >> > >> which will eventually call alloc_gigantic_page. If __GFP_THISNODE is > >> set we really do not want to execute the below for loop in alloc_gigantic_page. > > > >Yes, this is the case indeed. > > > >> I think the convention in the mm code is that only the lowest level > >> allocation routines should interpret the GFP flags. We may need to make > >> an exception here and check for __GFP_THISNODE. > > > >Yes this is true, But alloc_gigantic_page is actually low level > >allocation routine in fact. > > > Thanks for the review, we need to consider the __GFP_THISNODE flag. Yeah, my bad. Quite ugly but a larger rework would be needed to make it nicer. Not sure this is worth it. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a301c2d672bf..55baaac848da 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1256,6 +1256,16 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, struct page *page; int node; + if (nid != NUMA_NO_NODE && hugetlb_cma[nid]) { + page = cma_alloc(hugetlb_cma[node], nr_pages, + huge_page_order(h), true); + if (page) + return page; + } + + if (gfp_mask & __GFP_THISNODE) + goto fallback; + for_each_node_mask(node, *nodemask) { if (!hugetlb_cma[node]) continue; @@ -1266,6 +1276,7 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, return page; } } +fallback: #endif return alloc_contig_pages(nr_pages, gfp_mask, nid, nodemask);
On 2020-09-01 at 22:53 Michal Hocko wrote: >On Tue 01-09-20 22:20:44, Li Xinhai wrote: >> On 2020-09-01 at 21:41 Michal Hocko wrote: >> >On Mon 31-08-20 14:44:40, Mike Kravetz wrote: >> >> On 8/30/20 7:04 AM, Li Xinhai wrote: >> >> > Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic >> >> > hugepages using cma"), the gigantic page would be allocated from node >> >> > which is not the preferred node, although there are pages available from >> >> > that node. The reason is that the nid parameter has been ignored in >> >> > alloc_gigantic_page(). >> >> > >> >> > After this patch, the preferred node is tried first before other allowed >> >> > nodes. >> >> >> >> Thank you! >> >> This is an issue that needs to be fixed. >> >> >> >> > Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma") >> >> > Cc: Roman Gushchin <guro@fb.com> >> >> > Cc: Mike Kravetz <mike.kravetz@oracle.com> >> >> > Cc: Michal Hocko <mhocko@kernel.org> >> >> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> >> >> > --- >> >> > mm/hugetlb.c | 9 ++++++++- >> >> > 1 file changed, 8 insertions(+), 1 deletion(-) >> >> > >> >> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> >> > index a301c2d672bf..4a28b8853d47 100644 >> >> > --- a/mm/hugetlb.c >> >> > +++ b/mm/hugetlb.c >> >> > @@ -1256,8 +1256,15 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, >> >> > struct page *page; >> >> > int node; >> >> > >> >> > + if (hugetlb_cma[nid]) { >> >> > + page = cma_alloc(hugetlb_cma[nid], nr_pages, >> >> > + huge_page_order(h), true); >> >> > + if (page) >> >> > + return page; >> >> > + } >> >> > + >> >> >> >> When looking at your changes, I noticed that this code for allocation >> >> from CMA does not take gfp_mask into account. The 'normal' use case >> >> is to allocate pool pages with something similar to: >> >> >> >> echo 16 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >> >> >> >> The routine alloc_pool_huge_page will try to interleave pages among nodes: >> >> >> >> ... >> >> gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; >> >> >> >> for_each_node_mask_to_alloc(h, nr_nodes, node, nodes_allowed) { >> >> ... >> >> >> >> which will eventually call alloc_gigantic_page. If __GFP_THISNODE is >> >> set we really do not want to execute the below for loop in alloc_gigantic_page. >> > >> >Yes, this is the case indeed. >> > >> >> I think the convention in the mm code is that only the lowest level >> >> allocation routines should interpret the GFP flags. We may need to make >> >> an exception here and check for __GFP_THISNODE. >> > >> >Yes this is true, But alloc_gigantic_page is actually low level >> >allocation routine in fact. >> > >> Thanks for the review, we need to consider the __GFP_THISNODE flag. > >Yeah, my bad. Quite ugly but a larger rework would be needed to make it >nicer. Not sure this is worth it. > Just sent out the V2, and put the for-loop within the THISNODE check... >diff --git a/mm/hugetlb.c b/mm/hugetlb.c >index a301c2d672bf..55baaac848da 100644 >--- a/mm/hugetlb.c >+++ b/mm/hugetlb.c >@@ -1256,6 +1256,16 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, > struct page *page; > int node; > >+ if (nid != NUMA_NO_NODE && hugetlb_cma[nid]) { >+ page = cma_alloc(hugetlb_cma[node], nr_pages, >+ huge_page_order(h), true); >+ if (page) >+ return page; >+ } >+ >+ if (gfp_mask & __GFP_THISNODE) >+ goto fallback; >+ > for_each_node_mask(node, *nodemask) { > if (!hugetlb_cma[node]) > continue; >@@ -1266,6 +1276,7 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, > return page; > } > } >+fallback: > #endif > > return alloc_contig_pages(nr_pages, gfp_mask, nid, nodemask); >-- >Michal Hocko >SUSE Labs
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a301c2d672bf..4a28b8853d47 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1256,8 +1256,15 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, struct page *page; int node; + if (hugetlb_cma[nid]) { + page = cma_alloc(hugetlb_cma[nid], nr_pages, + huge_page_order(h), true); + if (page) + return page; + } + for_each_node_mask(node, *nodemask) { - if (!hugetlb_cma[node]) + if (node == nid || !hugetlb_cma[node]) continue; page = cma_alloc(hugetlb_cma[node], nr_pages,
Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma"), the gigantic page would be allocated from node which is not the preferred node, although there are pages available from that node. The reason is that the nid parameter has been ignored in alloc_gigantic_page(). After this patch, the preferred node is tried first before other allowed nodes. Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Cc: Roman Gushchin <guro@fb.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> --- mm/hugetlb.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)