mbox series

[v4,0/9] per lruvec lru_lock for memcg

Message ID 1574166203-151975-1-git-send-email-alex.shi@linux.alibaba.com (mailing list archive)
Headers show
Series per lruvec lru_lock for memcg | expand

Message

Alex Shi Nov. 19, 2019, 12:23 p.m. UTC
Hi all,

This patchset move lru_lock into lruvec, give a lru_lock for each of
lruvec, thus bring a lru_lock for each of memcg per node.

According to Daniel Jordan's suggestion, I run 64 'dd' with on 32
containers on my 2s* 8 core * HT box with the modefied case:
  https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice

With this change above lru_lock censitive testing improved 17% with multiple
containers scenario. And no performance lose w/o mem_cgroup.

Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought the same idea
7 years ago. Now I believe considering my testing result, and google internal
using fact. This feature is clearly benefit multi-container users.

So I'd like to introduce it here.

Thanks all the comments from Hugh Dickins, Konstantin Khlebnikov, Daniel Jordan, 
Johannes Weiner, Mel Gorman, Shakeel Butt, Rong Chen, Fengguang Wu, Yun Wang etc.

v4: 
  a, fix the page->mem_cgroup dereferencing issue, thanks Johannes Weiner
  b, remove the irqsave flags changes, thanks Metthew Wilcox
  c, merge/split patches for better understanding and bisection purpose

v3: rebase on linux-next, and fold the relock fix patch into introduceing patch

v2: bypass a performance regression bug and fix some function issues

v1: initial version, aim testing show 5% performance increase


Alex Shi (9):
  mm/swap: fix uninitialized compiler warning
  mm/huge_memory: fix uninitialized compiler warning
  mm/lru: replace pgdat lru_lock with lruvec lock
  mm/mlock: only change the lru_lock iff page's lruvec is different
  mm/swap: only change the lru_lock iff page's lruvec is different
  mm/vmscan: only change the lru_lock iff page's lruvec is different
  mm/pgdat: remove pgdat lru_lock
  mm/lru: likely enhancement
  mm/lru: revise the comments of lru_lock

 Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +----
 Documentation/admin-guide/cgroup-v1/memory.rst     |  6 +-
 Documentation/trace/events-kmem.rst                |  2 +-
 Documentation/vm/unevictable-lru.rst               | 22 +++----
 include/linux/memcontrol.h                         | 68 ++++++++++++++++++++
 include/linux/mm_types.h                           |  2 +-
 include/linux/mmzone.h                             |  5 +-
 mm/compaction.c                                    | 67 +++++++++++++------
 mm/filemap.c                                       |  4 +-
 mm/huge_memory.c                                   | 17 ++---
 mm/memcontrol.c                                    | 75 +++++++++++++++++-----
 mm/mlock.c                                         | 27 ++++----
 mm/mmzone.c                                        |  1 +
 mm/page_alloc.c                                    |  1 -
 mm/page_idle.c                                     |  5 +-
 mm/rmap.c                                          |  2 +-
 mm/swap.c                                          | 74 +++++++++------------
 mm/vmscan.c                                        | 74 ++++++++++-----------
 18 files changed, 287 insertions(+), 180 deletions(-)

Comments

Konstantin Khlebnikov Nov. 24, 2019, 3:49 p.m. UTC | #1
On 19/11/2019 15.23, Alex Shi wrote:
> Hi all,
> 
> This patchset move lru_lock into lruvec, give a lru_lock for each of
> lruvec, thus bring a lru_lock for each of memcg per node.
> 
> According to Daniel Jordan's suggestion, I run 64 'dd' with on 32
> containers on my 2s* 8 core * HT box with the modefied case:
>    https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice
> 
> With this change above lru_lock censitive testing improved 17% with multiple
> containers scenario. And no performance lose w/o mem_cgroup.

Splitting lru_lock isn't only option for solving this lock contention.
Also it doesn't help if all this happens in one cgroup.

I think better batching could solve more problems with less overhead.

Like larger per-cpu vectors or queues for each numa node or even for each lruvec.
This will preliminarily sort and aggregate pages so actual modification under
lru_lock will be much cheaper and fine grained.

> 
> Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought the same idea
> 7 years ago. Now I believe considering my testing result, and google internal
> using fact. This feature is clearly benefit multi-container users.
> 
> So I'd like to introduce it here.
> 
> Thanks all the comments from Hugh Dickins, Konstantin Khlebnikov, Daniel Jordan,
> Johannes Weiner, Mel Gorman, Shakeel Butt, Rong Chen, Fengguang Wu, Yun Wang etc.
> 
> v4:
>    a, fix the page->mem_cgroup dereferencing issue, thanks Johannes Weiner
>    b, remove the irqsave flags changes, thanks Metthew Wilcox
>    c, merge/split patches for better understanding and bisection purpose
> 
> v3: rebase on linux-next, and fold the relock fix patch into introduceing patch
> 
> v2: bypass a performance regression bug and fix some function issues
> 
> v1: initial version, aim testing show 5% performance increase
> 
> 
> Alex Shi (9):
>    mm/swap: fix uninitialized compiler warning
>    mm/huge_memory: fix uninitialized compiler warning
>    mm/lru: replace pgdat lru_lock with lruvec lock
>    mm/mlock: only change the lru_lock iff page's lruvec is different
>    mm/swap: only change the lru_lock iff page's lruvec is different
>    mm/vmscan: only change the lru_lock iff page's lruvec is different
>    mm/pgdat: remove pgdat lru_lock
>    mm/lru: likely enhancement
>    mm/lru: revise the comments of lru_lock
> 
>   Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +----
>   Documentation/admin-guide/cgroup-v1/memory.rst     |  6 +-
>   Documentation/trace/events-kmem.rst                |  2 +-
>   Documentation/vm/unevictable-lru.rst               | 22 +++----
>   include/linux/memcontrol.h                         | 68 ++++++++++++++++++++
>   include/linux/mm_types.h                           |  2 +-
>   include/linux/mmzone.h                             |  5 +-
>   mm/compaction.c                                    | 67 +++++++++++++------
>   mm/filemap.c                                       |  4 +-
>   mm/huge_memory.c                                   | 17 ++---
>   mm/memcontrol.c                                    | 75 +++++++++++++++++-----
>   mm/mlock.c                                         | 27 ++++----
>   mm/mmzone.c                                        |  1 +
>   mm/page_alloc.c                                    |  1 -
>   mm/page_idle.c                                     |  5 +-
>   mm/rmap.c                                          |  2 +-
>   mm/swap.c                                          | 74 +++++++++------------
>   mm/vmscan.c                                        | 74 ++++++++++-----------
>   18 files changed, 287 insertions(+), 180 deletions(-)
>