Message ID | 20201202084234.15797-3-ying.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | numa balancing: Migrate on fault among multiple bound nodes | expand |
On Wed, Dec 02, 2020 at 04:42:33PM +0800, Huang Ying wrote: > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > --- > man2/set_mempolicy.2 | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 > index 68011eecb..3754b3e12 100644 > --- a/man2/set_mempolicy.2 > +++ b/man2/set_mempolicy.2 > @@ -113,6 +113,12 @@ A nonempty > .I nodemask > specifies node IDs that are relative to the set of > node IDs allowed by the process's current cpuset. > +.TP > +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)" > +Enable the Linux kernel NUMA balancing for the task if it is supported > +by kernel. > +If the flag isn't supported by Linux kernel, return -1 and errno is > +set to EINVAL. > .PP > .I nodemask > points to a bit mask of node IDs that contains up to > @@ -293,6 +299,9 @@ argument specified both Should this be expanded more to clarify it applies to MPOL_BIND specifically? Maybe the first patch should be expanded more and explictly fail if MPOL_F_NUMA_BALANCING is used with anything other than MPOL_BIND? > .B MPOL_F_STATIC_NODES > and > .BR MPOL_F_RELATIVE_NODES . > +Or, the > +.B MPOL_F_NUMA_BALANCING > +isn't supported by the Linux kernel. This will be difficult for an app to distinguish but we can't go back in time and make this ENOSYS :( The linux-api people might have more guidance but it may go to the extent of including a small test program in the manual page for a sequence that tests whether MPOL_F_NUMA_BALANCING works. They might have a better recommendation on how it should be handled.
Hi Huang Ying, Please see a few fixes below. Michael, as always, some question for you too ;) Thanks, Alex On 12/2/20 9:42 AM, Huang Ying wrote: > Signed-off-by: "Huang, Ying" <ying.huang@intel.com> > --- > man2/set_mempolicy.2 | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 > index 68011eecb..3754b3e12 100644 > --- a/man2/set_mempolicy.2 > +++ b/man2/set_mempolicy.2 > @@ -113,6 +113,12 @@ A nonempty > .I nodemask > specifies node IDs that are relative to the set of > node IDs allowed by the process's current cpuset. > +.TP > +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)" I'd prefer it to be in alphabetical order (rather than just adding at the bottom). That way, when lists grow, it's easier to find things. > +Enable the Linux kernel NUMA balancing for the task if it is supported > +by kernel. I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the Linux kernel to differentiate it from other kernels. It only adds noise (IMHO). mtk? wfix: ... supported by _the_ kernel. > +If the flag isn't supported by Linux kernel, return -1 and errno is wfix: If the flag isn't supported by _the_ kernel, ... > +set to EINVAL. errno and EINVAL should use .I and .B respectively > .PP > .I nodemask > points to a bit mask of node IDs that contains up to > @@ -293,6 +299,9 @@ argument specified both > .B MPOL_F_STATIC_NODES > and > .BR MPOL_F_RELATIVE_NODES . > +Or, the > +.B MPOL_F_NUMA_BALANCING > +isn't supported by the Linux kernel. > .TP > .B ENOMEM > Insufficient kernel memory was available. >
Mel Gorman <mgorman@suse.de> writes: > On Wed, Dec 02, 2020 at 04:42:33PM +0800, Huang Ying wrote: >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >> --- >> man2/set_mempolicy.2 | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 >> index 68011eecb..3754b3e12 100644 >> --- a/man2/set_mempolicy.2 >> +++ b/man2/set_mempolicy.2 >> @@ -113,6 +113,12 @@ A nonempty >> .I nodemask >> specifies node IDs that are relative to the set of >> node IDs allowed by the process's current cpuset. >> +.TP >> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)" >> +Enable the Linux kernel NUMA balancing for the task if it is supported >> +by kernel. >> +If the flag isn't supported by Linux kernel, return -1 and errno is >> +set to EINVAL. >> .PP >> .I nodemask >> points to a bit mask of node IDs that contains up to >> @@ -293,6 +299,9 @@ argument specified both > > Should this be expanded more to clarify it applies to MPOL_BIND > specifically? > > Maybe the first patch should be expanded more and explictly fail if > MPOL_F_NUMA_BALANCING is used with anything other than MPOL_BIND? For MPOL_PREFERRED, why could we not use NUMA balancing to migrate pages to the accessing local node if it is same as the preferred node? We have a way to turn off NUMA balancing already, why could we not provide a way to enable it if that's intended? Even for MPOL_INTERLEAVE, if the target node is the same as the accessing local node, can we use NUMA balancing to migrate pages? So, I prefer to make MPOL_F_NUMA_BALANCING to be Optimizing with NUMA balancing if possible, and we may add more optimization in the future. Do you agree? Best Regards, Huang, Ying >> .B MPOL_F_STATIC_NODES >> and >> .BR MPOL_F_RELATIVE_NODES . >> +Or, the >> +.B MPOL_F_NUMA_BALANCING >> +isn't supported by the Linux kernel. > > This will be difficult for an app to distinguish but we can't go back in > time and make this ENOSYS :( > > The linux-api people might have more guidance but it may go to the > extent of including a small test program in the manual page for a > sequence that tests whether MPOL_F_NUMA_BALANCING works. They might have > a better recommendation on how it should be handled.
On Thu, Dec 03, 2020 at 09:49:02AM +0800, Huang, Ying wrote: > >> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 > >> index 68011eecb..3754b3e12 100644 > >> --- a/man2/set_mempolicy.2 > >> +++ b/man2/set_mempolicy.2 > >> @@ -113,6 +113,12 @@ A nonempty > >> .I nodemask > >> specifies node IDs that are relative to the set of > >> node IDs allowed by the process's current cpuset. > >> +.TP > >> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)" > >> +Enable the Linux kernel NUMA balancing for the task if it is supported > >> +by kernel. > >> +If the flag isn't supported by Linux kernel, return -1 and errno is > >> +set to EINVAL. > >> .PP > >> .I nodemask > >> points to a bit mask of node IDs that contains up to > >> @@ -293,6 +299,9 @@ argument specified both > > > > Should this be expanded more to clarify it applies to MPOL_BIND > > specifically? > > > > Maybe the first patch should be expanded more and explictly fail if > > MPOL_F_NUMA_BALANCING is used with anything other than MPOL_BIND? > > For MPOL_PREFERRED, why could we not use NUMA balancing to migrate pages > to the accessing local node if it is same as the preferred node? You could but the kernel patch does not do that by making preferred_nid stick to the preferred node when hinting faults are trapped on that VMA. It would have to be a separate patch coupled with a man page update. If you wanted to go in this direction in the future, then the patch should explicitly return an error *now* if MPOL_PREFERRED is or'd with MPOL_F_NUMA_BALANCING so that an application becomes aware of MPOL_F_NUMA_BALANCING then it can detect if support exists in the current running kernel. > Even for MPOL_INTERLEAVE, if the target node is the same as the > accessing local node, can we use NUMA balancing to migrate pages? > The intent of MPOL_INTERLEAVE is to average the costs of the memory access so the average cost across the VMA is roughly similar across the entire range. This may be particularly important if the VMA is shared between multiple threads that are spread out on multiple nodes. A change in semantics there should be clearly documented. Similar, if you want to go in this direction, MPOL_F_NUMA_BALANCING should be chcked against MPOL_INTERLEAVE and explicitly fail now so suport can be detected at runtime. > So, I prefer to make MPOL_F_NUMA_BALANCING to be > > Optimizing with NUMA balancing if possible, and we may add more > optimization in the future. > Maybe, but I think it's best that the actual behaviour of the kernel is documented instead of desired behaviour or future planning.
Hi, Alex, Sorry for late, I just notice this email today. "Alejandro Colomar (mailing lists; readonly)" <alx.mailinglists@gmail.com> writes: > Hi Huang Ying, > > Please see a few fixes below. > > Michael, as always, some question for you too ;) > > Thanks, > > Alex > > On 12/2/20 9:42 AM, Huang Ying wrote: >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >> --- >> man2/set_mempolicy.2 | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 >> index 68011eecb..3754b3e12 100644 >> --- a/man2/set_mempolicy.2 >> +++ b/man2/set_mempolicy.2 >> @@ -113,6 +113,12 @@ A nonempty >> .I nodemask >> specifies node IDs that are relative to the set of >> node IDs allowed by the process's current cpuset. >> +.TP >> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)" > > I'd prefer it to be in alphabetical order (rather than just adding at > the bottom). That's OK for me. But it's better to be done in another patch to distinguish contents from pure order change? > That way, when lists grow, it's easier to find things. > >> +Enable the Linux kernel NUMA balancing for the task if it is supported >> +by kernel. > > I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the > Linux kernel to differentiate it from other kernels. It only adds noise > (IMHO). mtk? Sure. Will fix this and all following comments below. Thanks a lot for your help! I am new to man pages. Best Regards, Huang, Ying
Hi Huang, Ying, Sorry I forgot to answer. See below. BTW, Linux 5.10 has been released recently; is this series already merged for 5.11? If not yet, could you just write '5.??' and we'll fix it (and add a commit number in a comment) when we know the definitive version? Thanks, Alex On 12/8/20 9:13 AM, Huang, Ying wrote: > Hi, Alex, > > Sorry for late, I just notice this email today. > > "Alejandro Colomar (mailing lists; readonly)" > <alx.mailinglists@gmail.com> writes: > >> Hi Huang Ying, >> >> Please see a few fixes below. >> >> Michael, as always, some question for you too ;) >> >> Thanks, >> >> Alex >> >> On 12/2/20 9:42 AM, Huang Ying wrote: >>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >>> --- >>> man2/set_mempolicy.2 | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 >>> index 68011eecb..3754b3e12 100644 >>> --- a/man2/set_mempolicy.2 >>> +++ b/man2/set_mempolicy.2 >>> @@ -113,6 +113,12 @@ A nonempty >>> .I nodemask >>> specifies node IDs that are relative to the set of >>> node IDs allowed by the process's current cpuset. >>> +.TP >>> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)" >> >> I'd prefer it to be in alphabetical order (rather than just adding at >> the bottom). > > That's OK for me. But it's better to be done in another patch to > distinguish contents from pure order change? Yes, if you could do a series of 2 patches with a reordering first, it would be great. > >> That way, when lists grow, it's easier to find things. >> >>> +Enable the Linux kernel NUMA balancing for the task if it is supported >>> +by kernel. >> >> I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the >> Linux kernel to differentiate it from other kernels. It only adds noise >> (IMHO). mtk? > > Sure. Will fix this and all following comments below. Thanks a lot for > your help! I am new to man pages. Thank you! > > Best Regards, > Huang, Ying >
"Alejandro Colomar (mailing lists; readonly)" <alx.mailinglists@gmail.com> writes: > Hi Huang, Ying, > > Sorry I forgot to answer. > See below. > > BTW, Linux 5.10 has been released recently; > is this series already merged for 5.11? > If not yet, could you just write '5.??' and we'll fix it (and add a > commit number in a comment) when we know the definitive version? Sure. Will replace it with 5.12. Thanks for reminding! Best Regards, Huang, Ying > Thanks, > > Alex > > On 12/8/20 9:13 AM, Huang, Ying wrote: >> Hi, Alex, >> >> Sorry for late, I just notice this email today. >> >> "Alejandro Colomar (mailing lists; readonly)" >> <alx.mailinglists@gmail.com> writes: >> >>> Hi Huang Ying, >>> >>> Please see a few fixes below. >>> >>> Michael, as always, some question for you too ;) >>> >>> Thanks, >>> >>> Alex >>> >>> On 12/2/20 9:42 AM, Huang Ying wrote: >>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> >>>> --- >>>> man2/set_mempolicy.2 | 9 +++++++++ >>>> 1 file changed, 9 insertions(+) >>>> >>>> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 >>>> index 68011eecb..3754b3e12 100644 >>>> --- a/man2/set_mempolicy.2 >>>> +++ b/man2/set_mempolicy.2 >>>> @@ -113,6 +113,12 @@ A nonempty >>>> .I nodemask >>>> specifies node IDs that are relative to the set of >>>> node IDs allowed by the process's current cpuset. >>>> +.TP >>>> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)" >>> >>> I'd prefer it to be in alphabetical order (rather than just adding at >>> the bottom). >> >> That's OK for me. But it's better to be done in another patch to >> distinguish contents from pure order change? > > Yes, if you could do a series of 2 patches with a reordering first, it > would be great. > >> >>> That way, when lists grow, it's easier to find things. >>> >>>> +Enable the Linux kernel NUMA balancing for the task if it is supported >>>> +by kernel. >>> >>> I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the >>> Linux kernel to differentiate it from other kernels. It only adds noise >>> (IMHO). mtk? >> >> Sure. Will fix this and all following comments below. Thanks a lot for >> your help! I am new to man pages. > > Thank you! > >> >> Best Regards, >> Huang, Ying >>
diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2 index 68011eecb..3754b3e12 100644 --- a/man2/set_mempolicy.2 +++ b/man2/set_mempolicy.2 @@ -113,6 +113,12 @@ A nonempty .I nodemask specifies node IDs that are relative to the set of node IDs allowed by the process's current cpuset. +.TP +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)" +Enable the Linux kernel NUMA balancing for the task if it is supported +by kernel. +If the flag isn't supported by Linux kernel, return -1 and errno is +set to EINVAL. .PP .I nodemask points to a bit mask of node IDs that contains up to @@ -293,6 +299,9 @@ argument specified both .B MPOL_F_STATIC_NODES and .BR MPOL_F_RELATIVE_NODES . +Or, the +.B MPOL_F_NUMA_BALANCING +isn't supported by the Linux kernel. .TP .B ENOMEM Insufficient kernel memory was available.
Signed-off-by: "Huang, Ying" <ying.huang@intel.com> --- man2/set_mempolicy.2 | 9 +++++++++ 1 file changed, 9 insertions(+)