Message ID | 1626077374-81682-1-git-send-email-feng.tang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce multi-preference mempolicy | expand |
On Mon, 12 Jul 2021 16:09:28 +0800 Feng Tang <feng.tang@intel.com> wrote: > This patch series introduces the concept of the MPOL_PREFERRED_MANY mempolicy. > This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2) > interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a > preference for nodes which will fulfil memory allocation requests. Unlike the > MPOL_PREFERRED mode, it takes a set of nodes. Like the MPOL_BIND interface, it > works over a set of nodes. Unlike MPOL_BIND, it will not cause a SIGSEGV or > invoke the OOM killer if those preferred nodes are not available. Do we have any real-world testing which demonstrates the benefits of all of this?
Hi Andrew, Thanks for reviewing! On Wed, Jul 14, 2021 at 05:15:40PM -0700, Andrew Morton wrote: > On Mon, 12 Jul 2021 16:09:28 +0800 Feng Tang <feng.tang@intel.com> wrote: > > > This patch series introduces the concept of the MPOL_PREFERRED_MANY mempolicy. > > This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2) > > interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a > > preference for nodes which will fulfil memory allocation requests. Unlike the > > MPOL_PREFERRED mode, it takes a set of nodes. Like the MPOL_BIND interface, it > > works over a set of nodes. Unlike MPOL_BIND, it will not cause a SIGSEGV or > > invoke the OOM killer if those preferred nodes are not available. > > Do we have any real-world testing which demonstrates the benefits of > all of this? We have done some internal tests, and are actively working with some external customer on using this new 'prefer-many' policy, as they have different types of memory (fast DRAM and slower Persistent memory) in system, and their program wants to set clear preference for several NUMA nodes, to better deploy the huge application data before running the application. We have met another issue that customer wanted to run a docker container while binding it to 2 persistent memory nodes, which always failed. At that time we tried 2 hack pachtes to solve it. https://lore.kernel.org/lkml/1604470210-124827-2-git-send-email-feng.tang@intel.com/ https://lore.kernel.org/lkml/1604470210-124827-3-git-send-email-feng.tang@intel.com/ And that use case can be easily achieved with this new policy. Thanks, Feng
On 7/14/21 5:15 PM, Andrew Morton wrote: > On Mon, 12 Jul 2021 16:09:28 +0800 Feng Tang <feng.tang@intel.com> wrote: >> This patch series introduces the concept of the MPOL_PREFERRED_MANY mempolicy. >> This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2) >> interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a >> preference for nodes which will fulfil memory allocation requests. Unlike the >> MPOL_PREFERRED mode, it takes a set of nodes. Like the MPOL_BIND interface, it >> works over a set of nodes. Unlike MPOL_BIND, it will not cause a SIGSEGV or >> invoke the OOM killer if those preferred nodes are not available. > Do we have any real-world testing which demonstrates the benefits of > all of this? Yes, it's actually been quite useful in practice already. If we take persistent memory media (PMEM) and hot-add/online it with the DAX kmem driver, we get NUMA nodes with lots of capacity (~6TB is typical) but weird performance; PMEM has good read speed, but low write speed. That low write speed is *so* low that it dominates the performance more than the distance from the CPUs. Folks who want PMEM really don't care about locality. The discussions with the testers usually go something like this: Tester: How do I make my test use PMEM on nodes 2 and 3? Kernel Guys: use 'numactl --membind=2-3' Tester: I tried that, but I'm getting allocation failures once I fill up PMEM. Shouldn't it fall back to DRAM? Kernel Guys: Fine, use 'numactl --preferred=2-3' Tester: That worked, but it started using DRAM after it exhausted node 2 Kernel Guys: Dang it. I forgot --preferred ignores everything after the first node. Fine, we'll patch the kernel. This has happened more than once. End users want to be able to specify a specific physical media, but don't want to have to deal with the sharp edges of strict binding. This has happened both with slow media like PMEM and "faster" media like High-Bandwidth Memory.