Message ID | 20181226133351.703380444@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PMEM NUMA node and hotness accounting/migration | expand |
Fengguang Wu <fengguang.wu@intel.com> writes: > From: Yao Yuan <yuan.yao@intel.com> > > Signed-off-by: Yao Yuan <yuan.yao@intel.com> > Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> > --- > arch/x86/kvm/mmu.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > --- linux.orig/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.846720344 +0800 > +++ linux/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.842719614 +0800 > @@ -950,6 +950,16 @@ static void mmu_free_memory_cache(struct > kmem_cache_free(cache, mc->objects[--mc->nobjs]); > } > > +static unsigned long __get_dram_free_pages(gfp_t gfp_mask) > +{ > + struct page *page; > + > + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id()); > + if (!page) > + return 0; > + return (unsigned long) page_address(page); > +} > + May be it is explained in other patches. What is preventing the allocation from pmem here? Is it that we are not using the memory policy prefered node id and hence the zone list we built won't have the PMEM node? > static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, > int min) > { > @@ -958,7 +968,7 @@ static int mmu_topup_memory_cache_page(s > if (cache->nobjs >= min) > return 0; > while (cache->nobjs < ARRAY_SIZE(cache->objects)) { > - page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT); > + page = (void *)__get_dram_free_pages(GFP_KERNEL_ACCOUNT); > if (!page) > return cache->nobjs >= min ? 0 : -ENOMEM; > cache->objects[cache->nobjs++] = page; -aneesh
On Tue, Jan 01, 2019 at 02:53:07PM +0530, Aneesh Kumar K.V wrote: > Fengguang Wu <fengguang.wu@intel.com> writes: > > > From: Yao Yuan <yuan.yao@intel.com> > > > > Signed-off-by: Yao Yuan <yuan.yao@intel.com> > > Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> > > --- > > arch/x86/kvm/mmu.c | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > --- linux.orig/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.846720344 +0800 > > +++ linux/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.842719614 +0800 > > @@ -950,6 +950,16 @@ static void mmu_free_memory_cache(struct > > kmem_cache_free(cache, mc->objects[--mc->nobjs]); > > } > > > > +static unsigned long __get_dram_free_pages(gfp_t gfp_mask) > > +{ > > + struct page *page; > > + > > + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id()); > > + if (!page) > > + return 0; > > + return (unsigned long) page_address(page); > > +} > > + > > May be it is explained in other patches. What is preventing the > allocation from pmem here? Is it that we are not using the memory > policy prefered node id and hence the zone list we built won't have the > PMEM node? That because the PMEM nodes are memory-only node in the patchset, so numa_node_id() will always return the node id from DRAM nodes. About the zone list, yes in patch 10/21 we build the PMEM nodes to seperate zonelist, so DRAM nodes will not fall back to PMEM nodes. > > > static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, > > int min) > > { > > @@ -958,7 +968,7 @@ static int mmu_topup_memory_cache_page(s > > if (cache->nobjs >= min) > > return 0; > > while (cache->nobjs < ARRAY_SIZE(cache->objects)) { > > - page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT); > > + page = (void *)__get_dram_free_pages(GFP_KERNEL_ACCOUNT); > > if (!page) > > return cache->nobjs >= min ? 0 : -ENOMEM; > > cache->objects[cache->nobjs++] = page; > > -aneesh >
On 12/26/18 5:14 AM, Fengguang Wu wrote: > +static unsigned long __get_dram_free_pages(gfp_t gfp_mask) > +{ > + struct page *page; > + > + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id()); > + if (!page) > + return 0; > + return (unsigned long) page_address(page); > +} There seems to be a ton of *policy* baked into these patches. For instance: thou shalt not allocate page tables pages from PMEM. That's surely not a policy we want to inflict on every Linux user until the end of time. I think the more important question is how we can have the specific policy that this patch implements, but also leave open room for other policies, such as: "I don't care how slow this VM runs, minimize the amount of fast memory it eats."
On Wed, Jan 02, 2019 at 08:47:25AM -0800, Dave Hansen wrote: >On 12/26/18 5:14 AM, Fengguang Wu wrote: >> +static unsigned long __get_dram_free_pages(gfp_t gfp_mask) >> +{ >> + struct page *page; >> + >> + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id()); >> + if (!page) >> + return 0; >> + return (unsigned long) page_address(page); >> +} > >There seems to be a ton of *policy* baked into these patches. For >instance: thou shalt not allocate page tables pages from PMEM. That's >surely not a policy we want to inflict on every Linux user until the end >of time. Right. It's straight forward policy for users that care performance. The project is planned by 3 steps, at this moment we are in phase (1): 1) core functionalities, easy to backport 2) upstream-able total solution 3) upstream when API stabilized The dumb kernel interface /proc/PID/idle_pages enables doing the majority policies in user space. However for the other smaller parts, it looks easier to just implement an obvious policy first. Then to consider more possibilities. >I think the more important question is how we can have the specific >policy that this patch implements, but also leave open room for other >policies, such as: "I don't care how slow this VM runs, minimize the >amount of fast memory it eats." Agreed. I'm open for more ways. We can treat these patches as the soliciting version. If anyone send reasonable improvements or even totally different way of doing it, I'd be happy to incorporate. Thanks, Fengguang
--- linux.orig/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.846720344 +0800 +++ linux/arch/x86/kvm/mmu.c 2018-12-26 20:54:48.842719614 +0800 @@ -950,6 +950,16 @@ static void mmu_free_memory_cache(struct kmem_cache_free(cache, mc->objects[--mc->nobjs]); } +static unsigned long __get_dram_free_pages(gfp_t gfp_mask) +{ + struct page *page; + + page = __alloc_pages(GFP_KERNEL_ACCOUNT, 0, numa_node_id()); + if (!page) + return 0; + return (unsigned long) page_address(page); +} + static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, int min) { @@ -958,7 +968,7 @@ static int mmu_topup_memory_cache_page(s if (cache->nobjs >= min) return 0; while (cache->nobjs < ARRAY_SIZE(cache->objects)) { - page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT); + page = (void *)__get_dram_free_pages(GFP_KERNEL_ACCOUNT); if (!page) return cache->nobjs >= min ? 0 : -ENOMEM; cache->objects[cache->nobjs++] = page;