diff mbox series

[2/2] vmalloc: Account memcg per vmalloc

Message ID 20241211043252.3295947-2-willy@infradead.org (mailing list archive)
State New
Headers show
Series [1/2] vmalloc: Fix accounting of VmallocUsed with i915 | expand

Commit Message

Matthew Wilcox Dec. 11, 2024, 4:32 a.m. UTC
Today we account each page individually to the memcg, which works well
enough, if a little inefficiently (N atomic operations per page instead
of N per allocation).  Unfortunately, the stats can get out of sync when
i915 calls vmap() with VM_MAP_PUT_PAGES.  The pages being passed were not
allocated by vmalloc, so the MEMCG_VMALLOC counter was never incremented.
But it is decremented when the pages are freed with vfree().

Solve all of this by tracking the memcg at the vm_struct level.
This logic has to live in the memcontrol file as it calls several
functions which are currently static.

Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap)
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/memcontrol.h |  7 ++++++
 include/linux/vmalloc.h    |  3 +++
 mm/memcontrol.c            | 46 ++++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c               | 14 ++++++------
 4 files changed, 63 insertions(+), 7 deletions(-)

Comments

Shakeel Butt Dec. 11, 2024, 5:06 a.m. UTC | #1
On Wed, Dec 11, 2024 at 04:32:50AM +0000, Matthew Wilcox (Oracle) wrote:
[...]
> +int obj_cgroup_charge_vmalloc(struct obj_cgroup **objcgp,
> +		unsigned int nr_pages, gfp_t gfp)
> +{
> +	struct obj_cgroup *objcg;
> +	int err;
> +
> +	if (mem_cgroup_disabled() || !(gfp & __GFP_ACCOUNT))
> +		return 0;
> +
> +	objcg = current_obj_cgroup();
> +	if (!objcg)
> +		return 0;
> +
> +	err = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
> +	if (err)
> +		return err;
> +	obj_cgroup_get(objcg);
> +	mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_VMALLOC, nr_pages);

obj_cgroup_memcg() needs to be within rcu. See MEMCG_PERCPU_B as an
example.

> +	*objcgp = objcg;
> +
> +	return 0;
> +}
> +
> +/**
> + * obj_cgroup_uncharge_vmalloc - Uncharge vmalloc memory
> + * @objcg: The object cgroup
> + * @nr_pages: Number of pages
> + */
> +void obj_cgroup_uncharge_vmalloc(struct obj_cgroup *objcg,
> +		unsigned int nr_pages)
> +{
> +	if (!objcg)
> +		return;
> +	mod_memcg_state(objcg->memcg, MEMCG_VMALLOC, 0L - nr_pages);

Please use obj_cgroup_memcg() above instead of objcg->memcg and within
rcu lock.

Overall the patch looks good.
Johannes Weiner Dec. 11, 2024, 4:09 p.m. UTC | #2
On Wed, Dec 11, 2024 at 04:32:50AM +0000, Matthew Wilcox (Oracle) wrote:
> Today we account each page individually to the memcg, which works well
> enough, if a little inefficiently (N atomic operations per page instead
> of N per allocation).  Unfortunately, the stats can get out of sync when
> i915 calls vmap() with VM_MAP_PUT_PAGES.  The pages being passed were not
> allocated by vmalloc, so the MEMCG_VMALLOC counter was never incremented.
> But it is decremented when the pages are freed with vfree().
>
> Solve all of this by tracking the memcg at the vm_struct level.
> This logic has to live in the memcontrol file as it calls several
> functions which are currently static.
>
> Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap)
> Cc: stable@vger.kernel.org
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/memcontrol.h |  7 ++++++
>  include/linux/vmalloc.h    |  3 +++
>  mm/memcontrol.c            | 46 ++++++++++++++++++++++++++++++++++++++
>  mm/vmalloc.c               | 14 ++++++------
>  4 files changed, 63 insertions(+), 7 deletions(-)

This would work, but it seems somewhat complicated. The atomics in
memcg charging and the vmstat updates are batched, and the per-page
overhead is for the most part cheap per-cpu ops. Not an issue per se.

You could do for MEMCG_VMALLOC what you did for nr_vmalloc_pages:

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 634162271c00..a889bb04405c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3353,7 +3353,11 @@ void vfree(const void *addr)
 		struct page *page = vm->pages[i];
 
 		BUG_ON(!page);
-		mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
+
+		/* Pages were allocated elsewhere */
+		if (!(vm->flags & VM_MAP_PUT_PAGES))
+			mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
+
 		/*
 		 * High-order allocs for huge vmallocs are split, so
 		 * can be freed as an array of order-0 allocations
Matthew Wilcox Dec. 11, 2024, 4:50 p.m. UTC | #3
On Wed, Dec 11, 2024 at 11:09:56AM -0500, Johannes Weiner wrote:
> This would work, but it seems somewhat complicated. The atomics in
> memcg charging and the vmstat updates are batched, and the per-page
> overhead is for the most part cheap per-cpu ops. Not an issue per se.

OK, fair enough, I hadn't realised it was a percpu-refcount.  Still,
we might consume several batches (batch size of 64) when we could do it
all in one shot.

Perhaps you'd be more persuaded by:

(a) If we clear __GFP_ACCOUNT then alloc_pages_bulk() will work, and
that's a pretty significant performance win over calling alloc_pages()
in a loop.

(b) Once we get to memdescs, calling alloc_pages() with __GFP_ACCOUNT
set is going to require allocating a memdesc to store the obj_cgroup
in, so in the future we'll save an allocation.

Your proposed alternative will work and is way less churn.  But it's
not preparing us for memdescs ;-)
Shakeel Butt Dec. 11, 2024, 7:32 p.m. UTC | #4
On Wed, Dec 11, 2024 at 04:50:39PM +0000, Matthew Wilcox wrote:
> On Wed, Dec 11, 2024 at 11:09:56AM -0500, Johannes Weiner wrote:
> > This would work, but it seems somewhat complicated. The atomics in
> > memcg charging and the vmstat updates are batched, and the per-page
> > overhead is for the most part cheap per-cpu ops. Not an issue per se.
> 
> OK, fair enough, I hadn't realised it was a percpu-refcount.  Still,
> we might consume several batches (batch size of 64) when we could do it
> all in one shot.
> 
> Perhaps you'd be more persuaded by:
> 
> (a) If we clear __GFP_ACCOUNT then alloc_pages_bulk() will work, and
> that's a pretty significant performance win over calling alloc_pages()
> in a loop.
> 
> (b) Once we get to memdescs, calling alloc_pages() with __GFP_ACCOUNT
> set is going to require allocating a memdesc to store the obj_cgroup
> in, so in the future we'll save an allocation.
> 
> Your proposed alternative will work and is way less churn.  But it's
> not preparing us for memdescs ;-)

We can make alloc_pages_bulk() work with __GFP_ACCOUNT but your second
argument is more compelling.

I am trying to think of what will we miss if we remove this per-page
memcg metadata. One thing I can think of is debugging a live system
or kdump where I need to track where a given page came from. I think
memory profiling will still be useful in combination with going through
all vmalloc regions where this page is mapped (is there an easy way to
tell if a page is from a vmalloc region?). So, for now I think we will
have alternative way to extract the useful information.

I think we can go with Johannes' solution for stable and discuss the
future direction more separately.
Matthew Wilcox Dec. 11, 2024, 8:20 p.m. UTC | #5
On Wed, Dec 11, 2024 at 11:32:13AM -0800, Shakeel Butt wrote:
> On Wed, Dec 11, 2024 at 04:50:39PM +0000, Matthew Wilcox wrote:
> > Perhaps you'd be more persuaded by:
> > 
> > (a) If we clear __GFP_ACCOUNT then alloc_pages_bulk() will work, and
> > that's a pretty significant performance win over calling alloc_pages()
> > in a loop.
> > 
> > (b) Once we get to memdescs, calling alloc_pages() with __GFP_ACCOUNT
> > set is going to require allocating a memdesc to store the obj_cgroup
> > in, so in the future we'll save an allocation.
> > 
> > Your proposed alternative will work and is way less churn.  But it's
> > not preparing us for memdescs ;-)
> 
> We can make alloc_pages_bulk() work with __GFP_ACCOUNT but your second
> argument is more compelling.
> 
> I am trying to think of what will we miss if we remove this per-page
> memcg metadata. One thing I can think of is debugging a live system
> or kdump where I need to track where a given page came from. I think

Umm, I don't think you know which vmalloc allocation a page came from
today?  I've sent patches to add that information before, but they were
rejected.  In fact, I don't think we know even _that_ a page belongs to
vmalloc today, do we?  Yes, we know that the page is accounted, and
which memcg it belongs to ... but nothing more.

I actually want to improve this, without adding additional overhead.
What I'm working on right now (before I got waylaid by this bug) is:

+struct choir {
+       struct kref refcount;
+       unsigned int nr;
+       struct page *pages[] __counted_by(nr);
+};

and rewriting vmalloc to be based on choirs instead of its own pages.
One thing I've come to realise today is that the obj_cgroup pointer
needs to be in the choir and not in the vm_struct so that we uncharge the
allocation when the choir refcount drops to 0, not when the allocation
is unmapped.

A regular choir allocation will (today) mark the pages in it as being
allocated to a choir (and thus not having their own refcount / mapcount),
but I'll give vmalloc a way to mark the pages as specifically being
from vmalloc.

There's a lot of moving parts to this ... it's proving quite tricky!

> I think we can go with Johannes' solution for stable and discuss the
> future direction more separately.

OK, I'll send a patch to do that.
Shakeel Butt Dec. 11, 2024, 8:58 p.m. UTC | #6
On Wed, Dec 11, 2024 at 08:20:36PM +0000, Matthew Wilcox wrote:
> On Wed, Dec 11, 2024 at 11:32:13AM -0800, Shakeel Butt wrote:
> > On Wed, Dec 11, 2024 at 04:50:39PM +0000, Matthew Wilcox wrote:
> > > Perhaps you'd be more persuaded by:
> > > 
> > > (a) If we clear __GFP_ACCOUNT then alloc_pages_bulk() will work, and
> > > that's a pretty significant performance win over calling alloc_pages()
> > > in a loop.
> > > 
> > > (b) Once we get to memdescs, calling alloc_pages() with __GFP_ACCOUNT
> > > set is going to require allocating a memdesc to store the obj_cgroup
> > > in, so in the future we'll save an allocation.
> > > 
> > > Your proposed alternative will work and is way less churn.  But it's
> > > not preparing us for memdescs ;-)
> > 
> > We can make alloc_pages_bulk() work with __GFP_ACCOUNT but your second
> > argument is more compelling.
> > 
> > I am trying to think of what will we miss if we remove this per-page
> > memcg metadata. One thing I can think of is debugging a live system
> > or kdump where I need to track where a given page came from. I think
> 
> Umm, I don't think you know which vmalloc allocation a page came from
> today?  I've sent patches to add that information before, but they were
> rejected. 

Do you have a link handy for that discussion?

> In fact, I don't think we know even _that_ a page belongs to
> vmalloc today, do we?  Yes, we know that the page is accounted, and
> which memcg it belongs to ... but nothing more.

Yes you are correct. At the moment it is a guesswork and exhaustive
search into multiple sources.

> 
> I actually want to improve this, without adding additional overhead.
> What I'm working on right now (before I got waylaid by this bug) is:
> 
> +struct choir {
> +       struct kref refcount;
> +       unsigned int nr;
> +       struct page *pages[] __counted_by(nr);
> +};
> 
> and rewriting vmalloc to be based on choirs instead of its own pages.
> One thing I've come to realise today is that the obj_cgroup pointer
> needs to be in the choir and not in the vm_struct so that we uncharge the
> allocation when the choir refcount drops to 0, not when the allocation
> is unmapped.

What/who else can take a reference on a choir?

> 
> A regular choir allocation will (today) mark the pages in it as being
> allocated to a choir (and thus not having their own refcount / mapcount),
> but I'll give vmalloc a way to mark the pages as specifically being
> from vmalloc.

This sounds good. Thanks for the awesome work.
Matthew Wilcox Dec. 11, 2024, 9:08 p.m. UTC | #7
On Wed, Dec 11, 2024 at 12:58:36PM -0800, Shakeel Butt wrote:
> On Wed, Dec 11, 2024 at 08:20:36PM +0000, Matthew Wilcox wrote:
> > Umm, I don't think you know which vmalloc allocation a page came from
> > today?  I've sent patches to add that information before, but they were
> > rejected. 
> 
> Do you have a link handy for that discussion?

It's not really relevant any more ...

https://lore.kernel.org/linux-mm/20180518194519.3820-18-willy@infradead.org/

and subsequent discussion:

https://lore.kernel.org/linux-mm/20180611121129.GB12912@bombadil.infradead.org/

It all predates memdesc.

> Yes you are correct. At the moment it is a guesswork and exhaustive
> search into multiple sources.

At least I should be able to narrow it down somewhat if we have a
PGTY_vmalloc.

> > I actually want to improve this, without adding additional overhead.
> > What I'm working on right now (before I got waylaid by this bug) is:
> > 
> > +struct choir {
> > +       struct kref refcount;
> > +       unsigned int nr;
> > +       struct page *pages[] __counted_by(nr);
> > +};
> > 
> > and rewriting vmalloc to be based on choirs instead of its own pages.
> > One thing I've come to realise today is that the obj_cgroup pointer
> > needs to be in the choir and not in the vm_struct so that we uncharge the
> > allocation when the choir refcount drops to 0, not when the allocation
> > is unmapped.
> 
> What/who else can take a reference on a choir?

The first user is remap_vmalloc_range() which today takes a
mapcount+refcount on the page underlying the vmalloc inside
vm_insert_page().

But I see choirs being useful more widely; for example in the XFS buffer
cache (which wouldn't be mappable to userspace).  They might even be
the right approach for zswap, replacing zpdesc.  Haven't looked into
that in enough detail yet.
kernel test robot Dec. 11, 2024, 10:17 p.m. UTC | #8
Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.13-rc2 next-20241211]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/vmalloc-Account-memcg-per-vmalloc/20241211-123433
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20241211043252.3295947-2-willy%40infradead.org
patch subject: [PATCH 2/2] vmalloc: Account memcg per vmalloc
config: s390-randconfig-001-20241212 (https://download.01.org/0day-ci/archive/20241212/202412120605.mggOamQB-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241212/202412120605.mggOamQB-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412120605.mggOamQB-lkp@intel.com/

All errors (new ones prefixed by >>):

   s390x-linux-ld: mm/vmalloc.o: in function `vfree':
>> vmalloc.c:(.text+0xb040): undefined reference to `obj_cgroup_uncharge_vmalloc'
   s390x-linux-ld: mm/vmalloc.o: in function `__vmalloc_node_range_noprof':
>> vmalloc.c:(.text+0xc5ec): undefined reference to `obj_cgroup_charge_vmalloc'
kernel test robot Dec. 11, 2024, 11:36 p.m. UTC | #9
Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.13-rc2 next-20241211]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/vmalloc-Account-memcg-per-vmalloc/20241211-123433
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20241211043252.3295947-2-willy%40infradead.org
patch subject: [PATCH 2/2] vmalloc: Account memcg per vmalloc
config: sparc64-randconfig-001-20241212 (https://download.01.org/0day-ci/archive/20241212/202412120722.YWlWwpqk-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241212/202412120722.YWlWwpqk-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412120722.YWlWwpqk-lkp@intel.com/

All errors (new ones prefixed by >>):

   sparc64-linux-ld: mm/vmalloc.o: in function `vfree':
>> mm/vmalloc.c:3386:(.text+0x6bbc): undefined reference to `obj_cgroup_uncharge_vmalloc'
   sparc64-linux-ld: mm/vmalloc.o: in function `__vmalloc_area_node':
>> mm/vmalloc.c:3676:(.text+0x6f94): undefined reference to `obj_cgroup_charge_vmalloc'


vim +3386 mm/vmalloc.c

  3329	
  3330	/**
  3331	 * vfree - Release memory allocated by vmalloc()
  3332	 * @addr:  Memory base address
  3333	 *
  3334	 * Free the virtually continuous memory area starting at @addr, as obtained
  3335	 * from one of the vmalloc() family of APIs.  This will usually also free the
  3336	 * physical memory underlying the virtual allocation, but that memory is
  3337	 * reference counted, so it will not be freed until the last user goes away.
  3338	 *
  3339	 * If @addr is NULL, no operation is performed.
  3340	 *
  3341	 * Context:
  3342	 * May sleep if called *not* from interrupt context.
  3343	 * Must not be called in NMI context (strictly speaking, it could be
  3344	 * if we have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, but making the calling
  3345	 * conventions for vfree() arch-dependent would be a really bad idea).
  3346	 */
  3347	void vfree(const void *addr)
  3348	{
  3349		struct vm_struct *vm;
  3350		int i;
  3351	
  3352		if (unlikely(in_interrupt())) {
  3353			vfree_atomic(addr);
  3354			return;
  3355		}
  3356	
  3357		BUG_ON(in_nmi());
  3358		kmemleak_free(addr);
  3359		might_sleep();
  3360	
  3361		if (!addr)
  3362			return;
  3363	
  3364		vm = remove_vm_area(addr);
  3365		if (unlikely(!vm)) {
  3366			WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
  3367					addr);
  3368			return;
  3369		}
  3370	
  3371		if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS))
  3372			vm_reset_perms(vm);
  3373		for (i = 0; i < vm->nr_pages; i++) {
  3374			struct page *page = vm->pages[i];
  3375	
  3376			BUG_ON(!page);
  3377			/*
  3378			 * High-order allocs for huge vmallocs are split, so
  3379			 * can be freed as an array of order-0 allocations
  3380			 */
  3381			__free_page(page);
  3382			cond_resched();
  3383		}
  3384		if (!(vm->flags & VM_MAP_PUT_PAGES))
  3385			atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
> 3386		obj_cgroup_uncharge_vmalloc(vm->objcg, vm->nr_pages);
  3387		kvfree(vm->pages);
  3388		kfree(vm);
  3389	}
  3390	EXPORT_SYMBOL(vfree);
  3391
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5502aa8e138e..83ebcadebba6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1676,6 +1676,10 @@  static inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 
 int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
 void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);
+int obj_cgroup_charge_vmalloc(struct obj_cgroup **objcgp,
+		unsigned int nr_pages, gfp_t gfp);
+void obj_cgroup_uncharge_vmalloc(struct obj_cgroup *objcgp,
+		unsigned int nr_pages);
 
 extern struct static_key_false memcg_bpf_enabled_key;
 static inline bool memcg_bpf_enabled(void)
@@ -1756,6 +1760,9 @@  static inline void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
 }
 
+/* Must be macros to avoid dereferencing objcg in vm_struct */
+#define obj_cgroup_charge_vmalloc(objcgp, nr_pages, gfp)	0
+#define obj_cgroup_uncharge_vmalloc(objcg, nr_pages)	do { } while (0)
 static inline struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio)
 {
 	return NULL;
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 31e9ffd936e3..ec7c2d607382 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -60,6 +60,9 @@  struct vm_struct {
 #endif
 	unsigned int		nr_pages;
 	phys_addr_t		phys_addr;
+#ifdef CONFIG_MEMCG
+	struct obj_cgroup	*objcg;
+#endif
 	const void		*caller;
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7b3503d12aaf..629bffc3e26d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5472,4 +5472,50 @@  static int __init mem_cgroup_swap_init(void)
 }
 subsys_initcall(mem_cgroup_swap_init);
 
+/**
+ * obj_cgroup_charge_vmalloc - Charge vmalloc memory
+ * @objcgp: Pointer to an object cgroup
+ * @nr_pages: Number of pages
+ * @gfp: Memory allocation flags
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int obj_cgroup_charge_vmalloc(struct obj_cgroup **objcgp,
+		unsigned int nr_pages, gfp_t gfp)
+{
+	struct obj_cgroup *objcg;
+	int err;
+
+	if (mem_cgroup_disabled() || !(gfp & __GFP_ACCOUNT))
+		return 0;
+
+	objcg = current_obj_cgroup();
+	if (!objcg)
+		return 0;
+
+	err = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
+	if (err)
+		return err;
+	obj_cgroup_get(objcg);
+	mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_VMALLOC, nr_pages);
+	*objcgp = objcg;
+
+	return 0;
+}
+
+/**
+ * obj_cgroup_uncharge_vmalloc - Uncharge vmalloc memory
+ * @objcg: The object cgroup
+ * @nr_pages: Number of pages
+ */
+void obj_cgroup_uncharge_vmalloc(struct obj_cgroup *objcg,
+		unsigned int nr_pages)
+{
+	if (!objcg)
+		return;
+	mod_memcg_state(objcg->memcg, MEMCG_VMALLOC, 0L - nr_pages);
+	obj_cgroup_uncharge_pages(objcg, nr_pages);
+	obj_cgroup_put(objcg);
+}
+
 #endif /* CONFIG_SWAP */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bc9c91f3b373..d5e9068d9091 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3374,7 +3374,6 @@  void vfree(const void *addr)
 		struct page *page = vm->pages[i];
 
 		BUG_ON(!page);
-		mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
 		/*
 		 * High-order allocs for huge vmallocs are split, so
 		 * can be freed as an array of order-0 allocations
@@ -3384,6 +3383,7 @@  void vfree(const void *addr)
 	}
 	if (!(vm->flags & VM_MAP_PUT_PAGES))
 		atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
+	obj_cgroup_uncharge_vmalloc(vm->objcg, vm->nr_pages);
 	kvfree(vm->pages);
 	kfree(vm);
 }
@@ -3537,6 +3537,9 @@  vm_area_alloc_pages(gfp_t gfp, int nid,
 	struct page *page;
 	int i;
 
+	/* Accounting handled in caller */
+	gfp &= ~__GFP_ACCOUNT;
+
 	/*
 	 * For order-0 pages we make use of bulk allocator, if
 	 * the page array is partly or not at all populated due
@@ -3670,12 +3673,9 @@  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		node, page_order, nr_small_pages, area->pages);
 
 	atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
-	if (gfp_mask & __GFP_ACCOUNT) {
-		int i;
-
-		for (i = 0; i < area->nr_pages; i++)
-			mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1);
-	}
+	ret = obj_cgroup_charge_vmalloc(&area->objcg, gfp_mask, area->nr_pages);
+	if (ret)
+		goto fail;
 
 	/*
 	 * If not enough pages were obtained to accomplish an