mbox series

[v3,0/3] protect page cache from freeing inode

Message ID 1578499437-1664-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
Headers show
Series protect page cache from freeing inode | expand

Message

Yafang Shao Jan. 8, 2020, 4:03 p.m. UTC
On my server there're some running MEMCGs protected by memory.{min, low},
but I found the usage of these MEMCGs abruptly became very small, which
were far less than the protect limit. It confused me and finally I
found that was because of inode stealing.
Once an inode is freed, all its belonging page caches will be dropped as
well, no matter how may page caches it has. So if we intend to protect the
page caches in a memcg, we must protect their host (the inode) first.
Otherwise the memcg protection can be easily bypassed with freeing inode,
especially if there're big files in this memcg.
The inherent mismatch between memcg and inode is a trouble. One inode can
be shared by different MEMCGs, but it is a very rare case. If an inode is
shared, its belonging page caches may be charged to different MEMCGs.
Currently there's no perfect solution to fix this kind of issue, but the
inode majority-writer ownership switching can help it more or less.

- Changes against v2:
    1. Seperates memcg patches from this patchset, suggested by Roman.
       A separate patch is alreay ACKed by Roman, please the MEMCG
       maintianers help take a look at it[1].
    2. Improves code around the usage of for_each_mem_cgroup(), suggested
       by Dave
    3. Use memcg_low_reclaim passed from scan_control, instead of
       introducing a new member in struct mem_cgroup.
    4. Some other code improvement suggested by Dave.


- Changes against v1:
Use the memcg passed from the shrink_control, instead of getting it from
inode itself, suggested by Dave. That could make the laying better.

[1]
https://lore.kernel.org/linux-mm/CALOAHbBhPgh3WEuLu2B6e2vj1J8K=gGOyCKzb8tKWmDqFs-rfQ@mail.gmail.com/

Yafang Shao (3):
  mm, list_lru: make memcg visible to lru walker isolation function
  mm, shrinker: make memcg low reclaim visible to lru walker isolation
    function
  memcg, inode: protect page cache from freeing inode

 fs/inode.c                 | 78 ++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/memcontrol.h | 21 +++++++++++++
 include/linux/shrinker.h   |  3 ++
 mm/list_lru.c              | 47 +++++++++++++++++-----------
 mm/memcontrol.c            | 15 ---------
 mm/vmscan.c                | 27 +++++++++-------
 6 files changed, 143 insertions(+), 48 deletions(-)

Comments

Yafang Shao Jan. 22, 2020, 1:46 p.m. UTC | #1
On Thu, Jan 9, 2020 at 12:04 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On my server there're some running MEMCGs protected by memory.{min, low},
> but I found the usage of these MEMCGs abruptly became very small, which
> were far less than the protect limit. It confused me and finally I
> found that was because of inode stealing.
> Once an inode is freed, all its belonging page caches will be dropped as
> well, no matter how may page caches it has. So if we intend to protect the
> page caches in a memcg, we must protect their host (the inode) first.
> Otherwise the memcg protection can be easily bypassed with freeing inode,
> especially if there're big files in this memcg.
> The inherent mismatch between memcg and inode is a trouble. One inode can
> be shared by different MEMCGs, but it is a very rare case. If an inode is
> shared, its belonging page caches may be charged to different MEMCGs.
> Currently there's no perfect solution to fix this kind of issue, but the
> inode majority-writer ownership switching can help it more or less.
>
> - Changes against v2:
>     1. Seperates memcg patches from this patchset, suggested by Roman.
>        A separate patch is alreay ACKed by Roman, please the MEMCG
>        maintianers help take a look at it[1].
>     2. Improves code around the usage of for_each_mem_cgroup(), suggested
>        by Dave
>     3. Use memcg_low_reclaim passed from scan_control, instead of
>        introducing a new member in struct mem_cgroup.
>     4. Some other code improvement suggested by Dave.
>
>
> - Changes against v1:
> Use the memcg passed from the shrink_control, instead of getting it from
> inode itself, suggested by Dave. That could make the laying better.
>
> [1]
> https://lore.kernel.org/linux-mm/CALOAHbBhPgh3WEuLu2B6e2vj1J8K=gGOyCKzb8tKWmDqFs-rfQ@mail.gmail.com/
>
> Yafang Shao (3):
>   mm, list_lru: make memcg visible to lru walker isolation function
>   mm, shrinker: make memcg low reclaim visible to lru walker isolation
>     function
>   memcg, inode: protect page cache from freeing inode
>
>  fs/inode.c                 | 78 ++++++++++++++++++++++++++++++++++++++++++++--
>  include/linux/memcontrol.h | 21 +++++++++++++
>  include/linux/shrinker.h   |  3 ++
>  mm/list_lru.c              | 47 +++++++++++++++++-----------
>  mm/memcontrol.c            | 15 ---------
>  mm/vmscan.c                | 27 +++++++++-------
>  6 files changed, 143 insertions(+), 48 deletions(-)
>

Dave,  Johannes,

Any comments on this new version ?

Thanks
Yafang
Dave Chinner Feb. 4, 2020, 9:19 p.m. UTC | #2
On Wed, Jan 22, 2020 at 09:46:57PM +0800, Yafang Shao wrote:
> On Thu, Jan 9, 2020 at 12:04 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On my server there're some running MEMCGs protected by memory.{min, low},
> > but I found the usage of these MEMCGs abruptly became very small, which
> > were far less than the protect limit. It confused me and finally I
> > found that was because of inode stealing.
> > Once an inode is freed, all its belonging page caches will be dropped as
> > well, no matter how may page caches it has. So if we intend to protect the
> > page caches in a memcg, we must protect their host (the inode) first.
> > Otherwise the memcg protection can be easily bypassed with freeing inode,
> > especially if there're big files in this memcg.
> > The inherent mismatch between memcg and inode is a trouble. One inode can
> > be shared by different MEMCGs, but it is a very rare case. If an inode is
> > shared, its belonging page caches may be charged to different MEMCGs.
> > Currently there's no perfect solution to fix this kind of issue, but the
> > inode majority-writer ownership switching can help it more or less.
> >
> > - Changes against v2:
> >     1. Seperates memcg patches from this patchset, suggested by Roman.
> >        A separate patch is alreay ACKed by Roman, please the MEMCG
> >        maintianers help take a look at it[1].
> >     2. Improves code around the usage of for_each_mem_cgroup(), suggested
> >        by Dave
> >     3. Use memcg_low_reclaim passed from scan_control, instead of
> >        introducing a new member in struct mem_cgroup.
> >     4. Some other code improvement suggested by Dave.
> >
> >
> > - Changes against v1:
> > Use the memcg passed from the shrink_control, instead of getting it from
> > inode itself, suggested by Dave. That could make the laying better.
> >
> > [1]
> > https://lore.kernel.org/linux-mm/CALOAHbBhPgh3WEuLu2B6e2vj1J8K=gGOyCKzb8tKWmDqFs-rfQ@mail.gmail.com/
> >
> > Yafang Shao (3):
> >   mm, list_lru: make memcg visible to lru walker isolation function
> >   mm, shrinker: make memcg low reclaim visible to lru walker isolation
> >     function
> >   memcg, inode: protect page cache from freeing inode
> >
> >  fs/inode.c                 | 78 ++++++++++++++++++++++++++++++++++++++++++++--
> >  include/linux/memcontrol.h | 21 +++++++++++++
> >  include/linux/shrinker.h   |  3 ++
> >  mm/list_lru.c              | 47 +++++++++++++++++-----------
> >  mm/memcontrol.c            | 15 ---------
> >  mm/vmscan.c                | 27 +++++++++-------
> >  6 files changed, 143 insertions(+), 48 deletions(-)
> >
> 
> Dave,  Johannes,
> 
> Any comments on this new version ?

Sorry, I lost track of this amongst travel and conferences mid
january. Can you update and post it again once -rc1 is out?

Cheers,

Dave.
Yafang Shao Feb. 5, 2020, 1:19 a.m. UTC | #3
On Wed, Feb 5, 2020 at 5:20 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Jan 22, 2020 at 09:46:57PM +0800, Yafang Shao wrote:
> > On Thu, Jan 9, 2020 at 12:04 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On my server there're some running MEMCGs protected by memory.{min, low},
> > > but I found the usage of these MEMCGs abruptly became very small, which
> > > were far less than the protect limit. It confused me and finally I
> > > found that was because of inode stealing.
> > > Once an inode is freed, all its belonging page caches will be dropped as
> > > well, no matter how may page caches it has. So if we intend to protect the
> > > page caches in a memcg, we must protect their host (the inode) first.
> > > Otherwise the memcg protection can be easily bypassed with freeing inode,
> > > especially if there're big files in this memcg.
> > > The inherent mismatch between memcg and inode is a trouble. One inode can
> > > be shared by different MEMCGs, but it is a very rare case. If an inode is
> > > shared, its belonging page caches may be charged to different MEMCGs.
> > > Currently there's no perfect solution to fix this kind of issue, but the
> > > inode majority-writer ownership switching can help it more or less.
> > >
> > > - Changes against v2:
> > >     1. Seperates memcg patches from this patchset, suggested by Roman.
> > >        A separate patch is alreay ACKed by Roman, please the MEMCG
> > >        maintianers help take a look at it[1].
> > >     2. Improves code around the usage of for_each_mem_cgroup(), suggested
> > >        by Dave
> > >     3. Use memcg_low_reclaim passed from scan_control, instead of
> > >        introducing a new member in struct mem_cgroup.
> > >     4. Some other code improvement suggested by Dave.
> > >
> > >
> > > - Changes against v1:
> > > Use the memcg passed from the shrink_control, instead of getting it from
> > > inode itself, suggested by Dave. That could make the laying better.
> > >
> > > [1]
> > > https://lore.kernel.org/linux-mm/CALOAHbBhPgh3WEuLu2B6e2vj1J8K=gGOyCKzb8tKWmDqFs-rfQ@mail.gmail.com/
> > >
> > > Yafang Shao (3):
> > >   mm, list_lru: make memcg visible to lru walker isolation function
> > >   mm, shrinker: make memcg low reclaim visible to lru walker isolation
> > >     function
> > >   memcg, inode: protect page cache from freeing inode
> > >
> > >  fs/inode.c                 | 78 ++++++++++++++++++++++++++++++++++++++++++++--
> > >  include/linux/memcontrol.h | 21 +++++++++++++
> > >  include/linux/shrinker.h   |  3 ++
> > >  mm/list_lru.c              | 47 +++++++++++++++++-----------
> > >  mm/memcontrol.c            | 15 ---------
> > >  mm/vmscan.c                | 27 +++++++++-------
> > >  6 files changed, 143 insertions(+), 48 deletions(-)
> > >
> >
> > Dave,  Johannes,
> >
> > Any comments on this new version ?
>
> Sorry, I lost track of this amongst travel and conferences mid
> january. Can you update and post it again once -rc1 is out?
>

Sure, I will do it.
Thanks for your reply.

Thanks
Yafang