Message ID | 0a27b6fcbd1f7af104d7f4cf0adc6a31e0e7dd19.1582216294.git.schatzberg.dan@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Charge loop device i/o to issuing cgroup | expand |
On Thu, Feb 20, 2020 at 8:52 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > memalloc_use_memcg() worked for kernel allocations but was silently > ignored for user pages. > > This patch establishes a precedence order for who gets charged: > > 1. If there is a memcg associated with the page already, that memcg is > charged. This happens during swapin. > > 2. If an explicit mm is passed, mm->memcg is charged. This happens > during page faults, which can be triggered in remote VMs (eg gup). > > 3. Otherwise consult the current process context. If it has configured > a current->active_memcg, use that. What if css_tryget_online(current->active_memcg) in get_mem_cgroup_from_current() fails? Do we want to change this to css_tryget() and even if that fails should we fallback to root_mem_cgroup or current->mm->memcg? > Otherwise, current->mm->memcg. > > Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it > would always charge the root cgroup. Now it looks up the current > active_memcg first (falling back to charging the root cgroup if not > set). > > Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Tejun Heo <tj@kernel.org> > --- > mm/memcontrol.c | 11 ++++++++--- > mm/shmem.c | 2 +- > 2 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6f6dc8712e39..b174aff4f069 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6317,7 +6317,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, > * @compound: charge the page as compound or small page > * > * Try to charge @page to the memcg that @mm belongs to, reclaiming > - * pages according to @gfp_mask if necessary. > + * pages according to @gfp_mask if necessary. If @mm is NULL, try to > + * charge to the active memcg. > * > * Returns 0 on success, with *@memcgp pointing to the charged memcg. > * Otherwise, an error code is returned. > @@ -6361,8 +6362,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, > } > } > > - if (!memcg) > - memcg = get_mem_cgroup_from_mm(mm); > + if (!memcg) { > + if (!mm) > + memcg = get_mem_cgroup_from_current(); > + else > + memcg = get_mem_cgroup_from_mm(mm); > + } > > ret = try_charge(memcg, gfp_mask, nr_pages); > > diff --git a/mm/shmem.c b/mm/shmem.c > index c8f7540ef048..7c7f5acf89d6 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1766,7 +1766,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, > } > > sbinfo = SHMEM_SB(inode->i_sb); > - charge_mm = vma ? vma->vm_mm : current->mm; > + charge_mm = vma ? vma->vm_mm : NULL; > > page = find_lock_entry(mapping, index); > if (xa_is_value(page)) { > -- > 2.17.1 >
Dan Schatzberg writes: >memalloc_use_memcg() worked for kernel allocations but was silently >ignored for user pages. > >This patch establishes a precedence order for who gets charged: > >1. If there is a memcg associated with the page already, that memcg is > charged. This happens during swapin. > >2. If an explicit mm is passed, mm->memcg is charged. This happens > during page faults, which can be triggered in remote VMs (eg gup). > >3. Otherwise consult the current process context. If it has configured > a current->active_memcg, use that. Otherwise, current->mm->memcg. > >Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it >would always charge the root cgroup. Now it looks up the current >active_memcg first (falling back to charging the root cgroup if not >set). > >Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> >Acked-by: Johannes Weiner <hannes@cmpxchg.org> >Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Chris Down <chris@chrisdown.name> Thanks! The clarification the v2 thread for this made things clear to me.
Hey Shakeel! On Thu, Feb 20, 2020 at 10:14:45AM -0800, Shakeel Butt wrote: > On Thu, Feb 20, 2020 at 8:52 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > > > memalloc_use_memcg() worked for kernel allocations but was silently > > ignored for user pages. > > > > This patch establishes a precedence order for who gets charged: > > > > 1. If there is a memcg associated with the page already, that memcg is > > charged. This happens during swapin. > > > > 2. If an explicit mm is passed, mm->memcg is charged. This happens > > during page faults, which can be triggered in remote VMs (eg gup). > > > > 3. Otherwise consult the current process context. If it has configured > > a current->active_memcg, use that. > > What if css_tryget_online(current->active_memcg) in > get_mem_cgroup_from_current() fails? Do we want to change this to > css_tryget() and even if that fails should we fallback to > root_mem_cgroup or current->mm->memcg? Good questions. I think we can switch to css_tryget(). If a cgroup goes offline between issuing the IO and the loop layer executing that IO, the resources used could end up in the root instead of the closest ancestor of the offlined group. However, the risk of that actually happening and causing problems is probably pretty small, and the behavior isn't really worse than before Dan's patches. Would you mind sending a separate patch for this? AFAICS similar concerns apply to all users of foreign charging. As for tryget failing: can that actually happen? AFAICS, all current users acquire a reference first (get_memcg_from_somewhere()) that they assign to current->active_memcg. We should probably codify this rule and do WARN_ON(!css_tryget()) /* current->active_memcg must hold a ref */
Hi Johannes, On Thu, Feb 20, 2020 at 1:03 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > Hey Shakeel! > > On Thu, Feb 20, 2020 at 10:14:45AM -0800, Shakeel Butt wrote: > > On Thu, Feb 20, 2020 at 8:52 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > > > > > memalloc_use_memcg() worked for kernel allocations but was silently > > > ignored for user pages. > > > > > > This patch establishes a precedence order for who gets charged: > > > > > > 1. If there is a memcg associated with the page already, that memcg is > > > charged. This happens during swapin. > > > > > > 2. If an explicit mm is passed, mm->memcg is charged. This happens > > > during page faults, which can be triggered in remote VMs (eg gup). > > > > > > 3. Otherwise consult the current process context. If it has configured > > > a current->active_memcg, use that. > > > > What if css_tryget_online(current->active_memcg) in > > get_mem_cgroup_from_current() fails? Do we want to change this to > > css_tryget() and even if that fails should we fallback to > > root_mem_cgroup or current->mm->memcg? > > Good questions. > > I think we can switch to css_tryget(). If a cgroup goes offline > between issuing the IO and the loop layer executing that IO, the > resources used could end up in the root instead of the closest > ancestor of the offlined group. However, the risk of that actually > happening and causing problems is probably pretty small, and the > behavior isn't really worse than before Dan's patches. Agreed. > > Would you mind sending a separate patch for this? AFAICS similar > concerns apply to all users of foreign charging. Sure and yes similar concerns apply to other users as well. > > As for tryget failing: can that actually happen? AFAICS, all current > users acquire a reference first (get_memcg_from_somewhere()) that they > assign to current->active_memcg. We should probably codify this rule > and do WARN_ON(!css_tryget()) /* current->active_memcg must hold a ref */ Yes, we should WARN_ON(). Shakeel
On Thu, Feb 20, 2020 at 8:52 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > memalloc_use_memcg() worked for kernel allocations but was silently > ignored for user pages. > > This patch establishes a precedence order for who gets charged: > > 1. If there is a memcg associated with the page already, that memcg is > charged. This happens during swapin. > > 2. If an explicit mm is passed, mm->memcg is charged. This happens > during page faults, which can be triggered in remote VMs (eg gup). > > 3. Otherwise consult the current process context. If it has configured > a current->active_memcg, use that. Otherwise, current->mm->memcg. > > Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it > would always charge the root cgroup. Now it looks up the current > active_memcg first (falling back to charging the root cgroup if not > set). > > Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com>
On Thu, 20 Feb 2020, Dan Schatzberg wrote: > memalloc_use_memcg() worked for kernel allocations but was silently > ignored for user pages. > > This patch establishes a precedence order for who gets charged: > > 1. If there is a memcg associated with the page already, that memcg is > charged. This happens during swapin. > > 2. If an explicit mm is passed, mm->memcg is charged. This happens > during page faults, which can be triggered in remote VMs (eg gup). > > 3. Otherwise consult the current process context. If it has configured > a current->active_memcg, use that. Otherwise, current->mm->memcg. > > Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it > would always charge the root cgroup. Now it looks up the current > active_memcg first (falling back to charging the root cgroup if not > set). > > Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Yes, internally we have some further not-yet-upstreamed complications here (mainly, the "memcg=" mount option for all charges on a tmpfs to be charged to that memcg); but what you're doing here does not obstruct adding that later, they fit in well with the hierarchy that you (and Johannes) mapped out above, and it's really an improvement for shmem not to be referring to current there - thanks. > --- > mm/memcontrol.c | 11 ++++++++--- > mm/shmem.c | 2 +- > 2 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6f6dc8712e39..b174aff4f069 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6317,7 +6317,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, > * @compound: charge the page as compound or small page > * > * Try to charge @page to the memcg that @mm belongs to, reclaiming > - * pages according to @gfp_mask if necessary. > + * pages according to @gfp_mask if necessary. If @mm is NULL, try to > + * charge to the active memcg. > * > * Returns 0 on success, with *@memcgp pointing to the charged memcg. > * Otherwise, an error code is returned. > @@ -6361,8 +6362,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, > } > } > > - if (!memcg) > - memcg = get_mem_cgroup_from_mm(mm); > + if (!memcg) { > + if (!mm) > + memcg = get_mem_cgroup_from_current(); > + else > + memcg = get_mem_cgroup_from_mm(mm); > + } > > ret = try_charge(memcg, gfp_mask, nr_pages); > > diff --git a/mm/shmem.c b/mm/shmem.c > index c8f7540ef048..7c7f5acf89d6 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1766,7 +1766,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, > } > > sbinfo = SHMEM_SB(inode->i_sb); > - charge_mm = vma ? vma->vm_mm : current->mm; > + charge_mm = vma ? vma->vm_mm : NULL; > > page = find_lock_entry(mapping, index); > if (xa_is_value(page)) { > -- > 2.17.1
On Sun, 23 Feb 2020, Hugh Dickins wrote: > On Thu, 20 Feb 2020, Dan Schatzberg wrote: > > > memalloc_use_memcg() worked for kernel allocations but was silently > > ignored for user pages. > > > > This patch establishes a precedence order for who gets charged: > > > > 1. If there is a memcg associated with the page already, that memcg is > > charged. This happens during swapin. > > > > 2. If an explicit mm is passed, mm->memcg is charged. This happens > > during page faults, which can be triggered in remote VMs (eg gup). > > > > 3. Otherwise consult the current process context. If it has configured > > a current->active_memcg, use that. Otherwise, current->mm->memcg. > > > > Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it > > would always charge the root cgroup. Now it looks up the current > > active_memcg first (falling back to charging the root cgroup if not > > set). > > > > Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > Acked-by: Tejun Heo <tj@kernel.org> > > Acked-by: Hugh Dickins <hughd@google.com> > > Yes, internally we have some further not-yet-upstreamed complications > here (mainly, the "memcg=" mount option for all charges on a tmpfs to > be charged to that memcg); but what you're doing here does not obstruct > adding that later, they fit in well with the hierarchy that you (and > Johannes) mapped out above, and it's really an improvement for shmem > not to be referring to current there - thanks. I acked slightly too soon. There are two other uses of "try_charge" in mm/shmem.c: we can be confident that the userfaultfd one knows what mm it's dealing with, but the shmem_swapin_page() instance has a similar use of current->mm, that you also want to adjust to NULL, don't you? Hugh
On Sun, Feb 23, 2020 at 05:11:12PM -0800, Hugh Dickins wrote: > On Sun, 23 Feb 2020, Hugh Dickins wrote: > > On Thu, 20 Feb 2020, Dan Schatzberg wrote: > > > > > memalloc_use_memcg() worked for kernel allocations but was silently > > > ignored for user pages. > > > > > > This patch establishes a precedence order for who gets charged: > > > > > > 1. If there is a memcg associated with the page already, that memcg is > > > charged. This happens during swapin. > > > > > > 2. If an explicit mm is passed, mm->memcg is charged. This happens > > > during page faults, which can be triggered in remote VMs (eg gup). > > > > > > 3. Otherwise consult the current process context. If it has configured > > > a current->active_memcg, use that. Otherwise, current->mm->memcg. > > > > > > Previously, if a NULL mm was passed to mem_cgroup_try_charge (case 3) it > > > would always charge the root cgroup. Now it looks up the current > > > active_memcg first (falling back to charging the root cgroup if not > > > set). > > > > > > Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> > > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > > Acked-by: Tejun Heo <tj@kernel.org> > > > > Acked-by: Hugh Dickins <hughd@google.com> > > > > Yes, internally we have some further not-yet-upstreamed complications > > here (mainly, the "memcg=" mount option for all charges on a tmpfs to > > be charged to that memcg); but what you're doing here does not obstruct > > adding that later, they fit in well with the hierarchy that you (and > > Johannes) mapped out above, and it's really an improvement for shmem > > not to be referring to current there - thanks. > > I acked slightly too soon. There are two other uses of "try_charge" in > mm/shmem.c: we can be confident that the userfaultfd one knows what mm > it's dealing with, but the shmem_swapin_page() instance has a similar > use of current->mm, that you also want to adjust to NULL, don't you? > > Hugh Yes, you're right. I'll change shmem_swapin_page as well
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6f6dc8712e39..b174aff4f069 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6317,7 +6317,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, * @compound: charge the page as compound or small page * * Try to charge @page to the memcg that @mm belongs to, reclaiming - * pages according to @gfp_mask if necessary. + * pages according to @gfp_mask if necessary. If @mm is NULL, try to + * charge to the active memcg. * * Returns 0 on success, with *@memcgp pointing to the charged memcg. * Otherwise, an error code is returned. @@ -6361,8 +6362,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, } } - if (!memcg) - memcg = get_mem_cgroup_from_mm(mm); + if (!memcg) { + if (!mm) + memcg = get_mem_cgroup_from_current(); + else + memcg = get_mem_cgroup_from_mm(mm); + } ret = try_charge(memcg, gfp_mask, nr_pages); diff --git a/mm/shmem.c b/mm/shmem.c index c8f7540ef048..7c7f5acf89d6 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1766,7 +1766,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, } sbinfo = SHMEM_SB(inode->i_sb); - charge_mm = vma ? vma->vm_mm : current->mm; + charge_mm = vma ? vma->vm_mm : NULL; page = find_lock_entry(mapping, index); if (xa_is_value(page)) {