Message ID | 1626077374-81682-3-git-send-email-feng.tang@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce multi-preference mempolicy | expand |
On Mon 12-07-21 16:09:30, Feng Tang wrote: > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED, > that it will first try to allocate memory from the preferred node(s), > and fallback to all nodes in system when first try fails. > > Add a dedicated function for it just like 'interleave' policy. > > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com > Suggested-by: Michal Hocko <mhocko@suse.com> > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > Signed-off-by: Feng Tang <feng.tang@intel.com> It would be better to squash this together with the actual user of the function added by the next patch. > --- > mm/mempolicy.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 17b5800b7dcc..d17bf018efcc 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, > return page; > } > > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order, > + struct mempolicy *pol) We likely want a node parameter to know which one we want to start with for locality. Callers should use policy_node for that. > +{ > + struct page *page; > + > + /* > + * This is a two pass approach. The first pass will only try the > + * preferred nodes but skip the direct reclaim and allow the > + * allocation to fail, while the second pass will try all the > + * nodes in system. > + */ > + page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM), > + order, first_node(pol->nodes), &pol->nodes); Although most users will likely have some form of GFP_*USER* here and clearing __GFP_DIRECT_RECLAIM will put all other reclaim modifiers out of game I think it would be better to explicitly disable some of them to prevent from surprises. E.g. any potential __GFP_NOFAIL would be more than surprising here. We do not have any (hopefully) but this should be pretty cheap to exclude as we already have to modify already. preferred_gfp = gfp | __GFP_NOWARN; preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL) > + if (!page) > + page = __alloc_pages(gfp, order, numa_node_id(), NULL); > + > + return page; > +} > + > /** > * alloc_pages_vma - Allocate a page for a VMA. > * @gfp: GFP flags. > -- > 2.7.4
On Wed, Jul 28, 2021 at 02:42:26PM +0200, Michal Hocko wrote: > On Mon 12-07-21 16:09:30, Feng Tang wrote: > > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED, > > that it will first try to allocate memory from the preferred node(s), > > and fallback to all nodes in system when first try fails. > > > > Add a dedicated function for it just like 'interleave' policy. > > > > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com > > Suggested-by: Michal Hocko <mhocko@suse.com> > > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > It would be better to squash this together with the actual user of the > function added by the next patch. Ok, will do > > --- > > mm/mempolicy.c | 19 +++++++++++++++++++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index 17b5800b7dcc..d17bf018efcc 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, > > return page; > > } > > > > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order, > > + struct mempolicy *pol) > > We likely want a node parameter to know which one we want to start with > for locality. Callers should use policy_node for that. Yes, locality should be considered, something like this? int pnid, lnid = numa_node_id(); if (is_nodeset(lnid, &pol->nodes)) pnid = local_nid; else pnid = first_node(pol->nodes); page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM), order, pnid, &pol->nodes); if (!page) page = __alloc_pages(gfp, order, lnid, NULL); return page; > > +{ > > + struct page *page; > > + > > + /* > > + * This is a two pass approach. The first pass will only try the > > + * preferred nodes but skip the direct reclaim and allow the > > + * allocation to fail, while the second pass will try all the > > + * nodes in system. > > + */ > > + page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM), > > + order, first_node(pol->nodes), &pol->nodes); > > Although most users will likely have some form of GFP_*USER* here and > clearing __GFP_DIRECT_RECLAIM will put all other reclaim modifiers out > of game I think it would be better to explicitly disable some of them to > prevent from surprises. E.g. any potential __GFP_NOFAIL would be more > than surprising here. We do not have any (hopefully) but this should be > pretty cheap to exclude as we already have to modify already. > > preferred_gfp = gfp | __GFP_NOWARN; > preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL) OK, will add. Thanks, Feng > > > + if (!page) > > + page = __alloc_pages(gfp, order, numa_node_id(), NULL); > > + > > + return page; > > +} > > + > > /** > > * alloc_pages_vma - Allocate a page for a VMA. > > * @gfp: GFP flags. > > -- > > 2.7.4 > > -- > Michal Hocko > SUSE Labs
On Wed, Jul 28, 2021 at 11:18:10PM +0800, Tang, Feng wrote: > On Wed, Jul 28, 2021 at 02:42:26PM +0200, Michal Hocko wrote: > > On Mon 12-07-21 16:09:30, Feng Tang wrote: > > > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED, > > > that it will first try to allocate memory from the preferred node(s), > > > and fallback to all nodes in system when first try fails. > > > > > > Add a dedicated function for it just like 'interleave' policy. > > > > > > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com > > > Suggested-by: Michal Hocko <mhocko@suse.com> > > > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com> > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > > > It would be better to squash this together with the actual user of the > > function added by the next patch. > > Ok, will do > > > > --- > > > mm/mempolicy.c | 19 +++++++++++++++++++ > > > 1 file changed, 19 insertions(+) > > > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > > index 17b5800b7dcc..d17bf018efcc 100644 > > > --- a/mm/mempolicy.c > > > +++ b/mm/mempolicy.c > > > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, > > > return page; > > > } > > > > > > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order, > > > + struct mempolicy *pol) > > > > We likely want a node parameter to know which one we want to start with > > for locality. Callers should use policy_node for that. > > Yes, locality should be considered, something like this? > > int pnid, lnid = numa_node_id(); > > if (is_nodeset(lnid, &pol->nodes)) > pnid = local_nid; > else > pnid = first_node(pol->nodes); One further thought is, if local node is not in the nodemask, should we compare the distance of all the nodes in nodemask to the local node and chose the shortest? Thanks, Feng > page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM), > order, pnid, &pol->nodes); > if (!page) > page = __alloc_pages(gfp, order, lnid, NULL); > return page; >
On Wed 28-07-21 23:18:10, Feng Tang wrote: > On Wed, Jul 28, 2021 at 02:42:26PM +0200, Michal Hocko wrote: > > On Mon 12-07-21 16:09:30, Feng Tang wrote: > > > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED, > > > that it will first try to allocate memory from the preferred node(s), > > > and fallback to all nodes in system when first try fails. > > > > > > Add a dedicated function for it just like 'interleave' policy. > > > > > > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com > > > Suggested-by: Michal Hocko <mhocko@suse.com> > > > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com> > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > > > It would be better to squash this together with the actual user of the > > function added by the next patch. > > Ok, will do > > > > --- > > > mm/mempolicy.c | 19 +++++++++++++++++++ > > > 1 file changed, 19 insertions(+) > > > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > > index 17b5800b7dcc..d17bf018efcc 100644 > > > --- a/mm/mempolicy.c > > > +++ b/mm/mempolicy.c > > > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, > > > return page; > > > } > > > > > > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order, > > > + struct mempolicy *pol) > > > > We likely want a node parameter to know which one we want to start with > > for locality. Callers should use policy_node for that. > > Yes, locality should be considered, something like this? > > int pnid, lnid = numa_node_id(); > > if (is_nodeset(lnid, &pol->nodes)) > pnid = local_nid; > else > pnid = first_node(pol->nodes); > > page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM), > order, pnid, &pol->nodes); > if (!page) > page = __alloc_pages(gfp, order, lnid, NULL); > return page; No. I really meant to get a node argument and use it as it is. Your callers already have some node preferences. Usually a local node and as we have a nodemask here then we do not really need to have any special logic here as mentioned in other email. The preferred node will act only as a source for the zone list.
On Wed 28-07-21 23:25:07, Feng Tang wrote: > On Wed, Jul 28, 2021 at 11:18:10PM +0800, Tang, Feng wrote: > > On Wed, Jul 28, 2021 at 02:42:26PM +0200, Michal Hocko wrote: > > > On Mon 12-07-21 16:09:30, Feng Tang wrote: > > > > The semantics of MPOL_PREFERRED_MANY is similar to MPOL_PREFERRED, > > > > that it will first try to allocate memory from the preferred node(s), > > > > and fallback to all nodes in system when first try fails. > > > > > > > > Add a dedicated function for it just like 'interleave' policy. > > > > > > > > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com > > > > Suggested-by: Michal Hocko <mhocko@suse.com> > > > > Co-developed-by: Ben Widawsky <ben.widawsky@intel.com> > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> > > > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > > > > > It would be better to squash this together with the actual user of the > > > function added by the next patch. > > > > Ok, will do > > > > > > --- > > > > mm/mempolicy.c | 19 +++++++++++++++++++ > > > > 1 file changed, 19 insertions(+) > > > > > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > > > index 17b5800b7dcc..d17bf018efcc 100644 > > > > --- a/mm/mempolicy.c > > > > +++ b/mm/mempolicy.c > > > > @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, > > > > return page; > > > > } > > > > > > > > +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order, > > > > + struct mempolicy *pol) > > > > > > We likely want a node parameter to know which one we want to start with > > > for locality. Callers should use policy_node for that. > > > > Yes, locality should be considered, something like this? > > > > int pnid, lnid = numa_node_id(); > > > > if (is_nodeset(lnid, &pol->nodes)) > > pnid = local_nid; > > else > > pnid = first_node(pol->nodes); > > One further thought is, if local node is not in the nodemask, > should we compare the distance of all the nodes in nodemask > to the local node and chose the shortest? Nope, That is zonelist for. Nodemask will do the rest.
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 17b5800b7dcc..d17bf018efcc 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2153,6 +2153,25 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, return page; } +static struct page *alloc_page_preferred_many(gfp_t gfp, unsigned int order, + struct mempolicy *pol) +{ + struct page *page; + + /* + * This is a two pass approach. The first pass will only try the + * preferred nodes but skip the direct reclaim and allow the + * allocation to fail, while the second pass will try all the + * nodes in system. + */ + page = __alloc_pages(((gfp | __GFP_NOWARN) & ~__GFP_DIRECT_RECLAIM), + order, first_node(pol->nodes), &pol->nodes); + if (!page) + page = __alloc_pages(gfp, order, numa_node_id(), NULL); + + return page; +} + /** * alloc_pages_vma - Allocate a page for a VMA. * @gfp: GFP flags.