Message ID | 20191113095530.228959-1-shile.zhang@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() | expand |
On Wed, 2019-11-13 at 17:55 +0800, Shile Zhang wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ > > Fix it by adding more conditions. > > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> > --- > mm/vmalloc.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index a3c70e275f4e..7b9fc7966da6 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) Is it cleaner to use if (IS_ENABLED()...) ? > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping.
On Wed, 13 Nov 2019 17:55:30 +0800 Shile Zhang <shile.zhang@linux.alibaba.com> wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ That is indeed a large performance regression. > Fix it by adding more conditions. > > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > ... > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) Are we sure that x86_32 is the only architecture whcih is (or ever will be) affected? > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping. CONFIG_X86_PAE depends on CONFIG_X86_32 so no need to check CONFIG_X86_32? From: Andrew Morton <akpm@linux-foundation.org> Subject: mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix simplify config expression, use IS_ENABLED() Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix +++ a/mm/vmalloc.c @@ -1255,16 +1255,17 @@ static bool __purge_vmap_area_lazy(unsig if (unlikely(valist == NULL)) return false; -#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) - /* - * First make sure the mappings are removed from all page-tables - * before they are freed. - * - * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is - * the case on a PAE kernel with PTI enabled. - */ - if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) - vmalloc_sync_all(); + if (IS_ENABLED(CONFIG_X86_PAE)) { + /* + * First make sure the mappings are removed from all page-tables + * before they are freed. + * + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which + * is the case on a PAE kernel with PTI enabled. + */ + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) + vmalloc_sync_all(); + } #endif /*
On Wed 13-11-19 17:55:30, Shile Zhang wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ > > Fix it by adding more conditions. This doesn't really explain a lot. Could you explain why the syncing should be limited only to PAE and !SHARED_KERNEL_PMD? > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> > --- > mm/vmalloc.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index a3c70e275f4e..7b9fc7966da6 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping. > -- > 2.24.0.rc2 >
On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote: > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif I already submitted another fix for this quite some time ago: https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/ Regards, Joerg
On Thu, 14 Nov 2019 18:12:31 +0100 Joerg Roedel <jroedel@suse.de> wrote: > On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote: > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > > /* > > * First make sure the mappings are removed from all page-tables > > * before they are freed. > > + * > > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > > + * the case on a PAE kernel with PTI enabled. > > */ > > - vmalloc_sync_all(); > > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > > + vmalloc_sync_all(); > > +#endif > > I already submitted another fix for this quite some time ago: > > https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/ > Your patch is more verbose but looks quite a bit nicer than this patch (Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com). The separation of sync-for-a-mapping versus sync-for-an-unmapping adds clarity. It's fairly urgent - I consider this to be -stable material. Thomas & co, was that a deliberate skip?
On Thu, Nov 14, 2019 at 04:01:08PM -0800, Andrew Morton wrote: > It's fairly urgent - I consider this to be -stable material. > > Thomas & co, was that a deliberate skip? More like lost in the mail flood. I'm assuming you're taking it? In any case, patch looks ok to me - the "unmappings" naming sounds like coming from someone with a very wild phantasy... :-) Thx.
On Fri, 15 Nov 2019 09:38:47 +0100 Borislav Petkov <bp@alien8.de> wrote: > On Thu, Nov 14, 2019 at 04:01:08PM -0800, Andrew Morton wrote: > > It's fairly urgent - I consider this to be -stable material. > > > > Thomas & co, was that a deliberate skip? > > More like lost in the mail flood. I'm assuming you're taking it? > > In any case, patch looks ok to me - the "unmappings" naming sounds like > coming from someone with a very wild phantasy... :-) > OK, thanks, here's what I queued. Reviews, acks and testing, please?? From: Joerg Roedel <jroedel@suse.de> Subject: x86/mm: Split vmalloc_sync_all() Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Signed-off-by: Joerg Roedel <jroedel@suse.de> Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++-- drivers/acpi/apei/ghes.c | 2 +- include/linux/vmalloc.h | 5 +++-- kernel/notifier.c | 2 +- mm/nommu.c | 10 +++++++--- mm/vmalloc.c | 11 +++++++---- 6 files changed, 43 insertions(+), 13 deletions(-) --- a/arch/x86/mm/fault.c~x86-mm-split-vmalloc_sync_all +++ a/arch/x86/mm/fault.c @@ -189,7 +189,7 @@ static inline pmd_t *vmalloc_sync_one(pg return pmd_k; } -void vmalloc_sync_all(void) +static void vmalloc_sync(void) { unsigned long address; @@ -216,6 +216,16 @@ void vmalloc_sync_all(void) } } +void vmalloc_sync_mappings(void) +{ + vmalloc_sync(); +} + +void vmalloc_sync_unmappings(void) +{ + vmalloc_sync(); +} + /* * 32-bit: * @@ -318,11 +328,23 @@ out: #else /* CONFIG_X86_64: */ -void vmalloc_sync_all(void) +void vmalloc_sync_mappings(void) { + /* + * 64-bit mappings might allocate new p4d/pud pages + * that need to be propagated to all tasks' PGDs. + */ sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END); } +void vmalloc_sync_unmappings(void) +{ + /* + * Unmappings never allocate or free p4d/pud pages. + * No work is required here. + */ +} + /* * 64-bit: * --- a/drivers/acpi/apei/ghes.c~x86-mm-split-vmalloc_sync_all +++ a/drivers/acpi/apei/ghes.c @@ -171,7 +171,7 @@ int ghes_estatus_pool_init(int num_ghes) * New allocation must be visible in all pgd before it can be found by * an NMI allocating from the pool. */ - vmalloc_sync_all(); + vmalloc_sync_mappings(); rc = gen_pool_add(ghes_estatus_pool, addr, PAGE_ALIGN(len), -1); if (rc) --- a/include/linux/vmalloc.h~x86-mm-split-vmalloc_sync_all +++ a/include/linux/vmalloc.h @@ -126,8 +126,9 @@ extern int remap_vmalloc_range_partial(s extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, unsigned long pgoff); -void vmalloc_sync_all(void); - +void vmalloc_sync_mappings(void); +void vmalloc_sync_unmappings(void); + /* * Lowlevel-APIs (not for driver use!) */ --- a/kernel/notifier.c~x86-mm-split-vmalloc_sync_all +++ a/kernel/notifier.c @@ -554,7 +554,7 @@ NOKPROBE_SYMBOL(notify_die); int register_die_notifier(struct notifier_block *nb) { - vmalloc_sync_all(); + vmalloc_sync_mappings(); return atomic_notifier_chain_register(&die_chain, nb); } EXPORT_SYMBOL_GPL(register_die_notifier); --- a/mm/nommu.c~x86-mm-split-vmalloc_sync_all +++ a/mm/nommu.c @@ -359,10 +359,14 @@ void vm_unmap_aliases(void) EXPORT_SYMBOL_GPL(vm_unmap_aliases); /* - * Implement a stub for vmalloc_sync_all() if the architecture chose not to - * have one. + * Implement a stub for vmalloc_sync_[un]mapping() if the architecture + * chose not to have one. */ -void __weak vmalloc_sync_all(void) +void __weak vmalloc_sync_mappings(void) +{ +} + +void __weak vmalloc_sync_unmappings(void) { } --- a/mm/vmalloc.c~x86-mm-split-vmalloc_sync_all +++ a/mm/vmalloc.c @@ -1259,7 +1259,7 @@ static bool __purge_vmap_area_lazy(unsig * First make sure the mappings are removed from all page-tables * before they are freed. */ - vmalloc_sync_all(); + vmalloc_sync_unmappings(); /* * TODO: to calculate a flush range without looping. @@ -3050,16 +3050,19 @@ int remap_vmalloc_range(struct vm_area_s EXPORT_SYMBOL(remap_vmalloc_range); /* - * Implement a stub for vmalloc_sync_all() if the architecture chose not to - * have one. + * Implement stubs for vmalloc_sync_[un]mappings () if the architecture chose + * not to have one. * * The purpose of this function is to make sure the vmalloc area * mappings are identical in all page-tables in the system. */ -void __weak vmalloc_sync_all(void) +void __weak vmalloc_sync_mappings(void) { } +void __weak vmalloc_sync_unmappings(void) +{ +} static int f(pte_t *pte, unsigned long addr, void *data) {
On Fri, Nov 15, 2019 at 03:37:21PM -0800, Andrew Morton wrote: > OK, thanks, here's what I queued. Reviews, acks and testing, please?? > > From: Joerg Roedel <jroedel@suse.de> > Subject: x86/mm: Split vmalloc_sync_all() > > Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in > __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the > vunmap() code-path. While this change was necessary to maintain > correctness on x86-32-pae kernels, it also adds additional cycles for > architectures that don't need it. ... > arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++-- > drivers/acpi/apei/ghes.c | 2 +- > include/linux/vmalloc.h | 5 +++-- > kernel/notifier.c | 2 +- > mm/nommu.c | 10 +++++++--- > mm/vmalloc.c | 11 +++++++---- > 6 files changed, 43 insertions(+), 13 deletions(-) Booting it looks ok here: Tested-by: Borislav Petkov <bp@suse.de>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a3c70e275f4e..7b9fc7966da6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) if (unlikely(valist == NULL)) return false; +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) /* * First make sure the mappings are removed from all page-tables * before they are freed. + * + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is + * the case on a PAE kernel with PTI enabled. */ - vmalloc_sync_all(); + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) + vmalloc_sync_all(); +#endif /* * TODO: to calculate a flush range without looping.
vmalloc_sync_all() was put in the common path in __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 with PTI enabled. It is needless for X86_64, which caused a big regression in UnixBench Shell8 testing on X86_64. Similar regression also reported by 0-day kernel test robot in reaim benchmarking: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Fix it by adding more conditions. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> --- mm/vmalloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)