Message ID | 20160929073411.3154-1-jszhang@marvell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 29 Sep 2016 09:18:18 +0100 Chris Wilson wrote: > On Thu, Sep 29, 2016 at 03:34:11PM +0800, Jisheng Zhang wrote: > > On Marvell berlin arm64 platforms, I see the preemptoff tracer report > > a max 26543 us latency at __purge_vmap_area_lazy, this latency is an > > awfully bad for STB. And the ftrace log also shows __free_vmap_area > > contributes most latency now. I noticed that Joel mentioned the same > > issue[1] on x86 platform and gave two solutions, but it seems no patch > > is sent out for this purpose. > > > > This patch adopts Joel's first solution, but I use 16MB per core > > rather than 8MB per core for the number of lazy_max_pages. After this > > patch, the preemptoff tracer reports a max 6455us latency, reduced to > > 1/4 of original result. > > My understanding is that > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 91f44e78c516..3f7c6d6969ac 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -626,7 +626,6 @@ void set_iounmap_nonlazy(void) > static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end, > int sync, int force_flush) > { > - static DEFINE_SPINLOCK(purge_lock); > struct llist_node *valist; > struct vmap_area *va; > struct vmap_area *n_va; > @@ -637,12 +636,6 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end, > * should not expect such behaviour. This just simplifies locking for > * the case that isn't actually used at the moment anyway. > */ > - if (!sync && !force_flush) { > - if (!spin_trylock(&purge_lock)) > - return; > - } else > - spin_lock(&purge_lock); > - > if (sync) > purge_fragmented_blocks_allcpus(); > > @@ -667,7 +660,6 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end, > __free_vmap_area(va); > spin_unlock(&vmap_area_lock); Hi Chris, Per my test, the bottleneck now is __free_vmap_area() over the valist, the iteration is protected with spinlock vmap_area_lock. So the larger lazy max pages, the longer valist, the bigger the latency. So besides above patch, we still need to remove vmap_are_lock or replace with mutex. Thanks, Jisheng > } > - spin_unlock(&purge_lock); > } > > /* > > > should now be safe. That should significantly reduce the preempt-disabled > section, I think. > -Chris >
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 91f44e7..66f377a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -596,7 +596,7 @@ static unsigned long lazy_max_pages(void) log = fls(num_online_cpus()); - return log * (32UL * 1024 * 1024 / PAGE_SIZE); + return log * (16UL * 1024 * 1024 / PAGE_SIZE); } static atomic_t vmap_lazy_nr = ATOMIC_INIT(0);
On Marvell berlin arm64 platforms, I see the preemptoff tracer report a max 26543 us latency at __purge_vmap_area_lazy, this latency is an awfully bad for STB. And the ftrace log also shows __free_vmap_area contributes most latency now. I noticed that Joel mentioned the same issue[1] on x86 platform and gave two solutions, but it seems no patch is sent out for this purpose. This patch adopts Joel's first solution, but I use 16MB per core rather than 8MB per core for the number of lazy_max_pages. After this patch, the preemptoff tracer reports a max 6455us latency, reduced to 1/4 of original result. [1] http://lkml.iu.edu/hypermail/linux/kernel/1603.2/04803.html Signed-off-by: Jisheng Zhang <jszhang@marvell.com> --- mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)