[08/41] mm: introduce CONFIG_PER_VMA_LOCK

Message ID	20230109205336.3665937-9-surenb@google.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> Date: Mon, 9 Jan 2023 12:53:03 -0800 In-Reply-To: <20230109205336.3665937-1-surenb@google.com> Mime-Version: 1.0 References: <20230109205336.3665937-1-surenb@google.com> Message-ID: <20230109205336.3665937-9-surenb@google.com> Subject: [PATCH 08/41] mm: introduce CONFIG_PER_VMA_LOCK From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Per-VMA locks \| expand [00/41] Per-VMA locks [01/41] maple_tree: Be more cautious about dead nodes [02/41] maple_tree: Detect dead nodes in mas_start() [03/41] maple_tree: Fix freeing of nodes in rcu mode [04/41] maple_tree: remove extra smp_wmb() from mas_dead_leaves() [05/41] maple_tree: Fix write memory barrier of nodes once dead for RCU mode [06/41] maple_tree: Add smp_rmb() to dead node detection [07/41] mm: Enable maple tree RCU mode by default. [08/41] mm: introduce CONFIG_PER_VMA_LOCK [09/41] mm: rcu safe VMA freeing [10/41] mm: move mmap_lock assert function definitions [11/41] mm: export dump_mm() [12/41] mm: add per-VMA lock and helper functions to control it [13/41] mm: introduce vma->vm_flags modifier functions [14/41] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK [15/41] mm: replace vma->vm_flags direct modifications with modifier calls [16/41] mm: replace vma->vm_flags indirect modification in ksm_madvise [17/41] mm/mmap: move VMA locking before anon_vma_lock_write call [18/41] mm/khugepaged: write-lock VMA while collapsing a huge page [19/41] mm/mmap: write-lock VMAs before merging, splitting or expanding them [20/41] mm/mmap: write-lock VMAs in vma_adjust [21/41] mm/mmap: write-lock VMAs affected by VMA expansion [22/41] mm/mremap: write-lock VMA while remapping it to a new address range [23/41] mm: write-lock VMAs before removing them from VMA tree [24/41] mm: conditionally write-lock VMA in free_pgtables [25/41] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area [26/41] kernel/fork: assert no VMA readers during its destruction [27/41] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration [28/41] mm: introduce lock_vma_under_rcu to be used from arch-specific code [29/41] mm: fall back to mmap_lock if vma->anon_vma is not yet set [30/41] mm: add FAULT_FLAG_VMA_LOCK flag [31/41] mm: prevent do_swap_page from handling page faults under VMA lock [32/41] mm: prevent userfaults to be handled under per-vma lock [33/41] mm: introduce per-VMA lock statistics [34/41] x86/mm: try VMA lock-based page fault handling first [35/41] arm64/mm: try VMA lock-based page fault handling first [36/41] powerc/mm: try VMA lock-based page fault handling first [37/41] mm: introduce mod_vm_flags_nolock [38/41] mm: avoid assertion in untrack_pfn [39/41] kernel/fork: throttle call_rcu() calls in vm_area_free [40/41] mm: separate vma->lock from vm_area_struct [41/41] mm: replace rw_semaphore with atomic_t in vma_lock

Suren Baghdasaryan Jan. 9, 2023, 8:53 p.m. UTC

This configuration variable will be used to build the support for VMA
locking during page fault handling.

This is enabled by default on supported architectures with SMP and MMU
set.

The architecture support is needed since the page fault handler is called
from the architecture's page faulting code which needs modifications to
handle faults under VMA lock.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/Kconfig | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Davidlohr Bueso Jan. 11, 2023, 12:13 a.m. UTC | #1

On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:

>This configuration variable will be used to build the support for VMA
>locking during page fault handling.
>
>This is enabled by default on supported architectures with SMP and MMU
>set.
>
>The architecture support is needed since the page fault handler is called
>from the architecture's page faulting code which needs modifications to
>handle faults under VMA lock.

I don't think that per-vma locking should be something that is user-configurable.
It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?

Thanks,
Davidlohr

Suren Baghdasaryan Jan. 11, 2023, 12:44 a.m. UTC | #2

On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
>
> On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
>
> >This configuration variable will be used to build the support for VMA
> >locking during page fault handling.
> >
> >This is enabled by default on supported architectures with SMP and MMU
> >set.
> >
> >The architecture support is needed since the page fault handler is called
> >from the architecture's page faulting code which needs modifications to
> >handle faults under VMA lock.
>
> I don't think that per-vma locking should be something that is user-configurable.
> It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?

Thanks for the suggestion! I would be happy to make that change if
there are no objections. I think the only pushback might have been the
vma size increase but with the latest optimization in the last patch
maybe that's less of an issue?

>
> Thanks,
> Davidlohr
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

Michal Hocko Jan. 11, 2023, 8:23 a.m. UTC | #3

On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> >
> > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> >
> > >This configuration variable will be used to build the support for VMA
> > >locking during page fault handling.
> > >
> > >This is enabled by default on supported architectures with SMP and MMU
> > >set.
> > >
> > >The architecture support is needed since the page fault handler is called
> > >from the architecture's page faulting code which needs modifications to
> > >handle faults under VMA lock.
> >
> > I don't think that per-vma locking should be something that is user-configurable.
> > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> 
> Thanks for the suggestion! I would be happy to make that change if
> there are no objections. I think the only pushback might have been the
> vma size increase but with the latest optimization in the last patch
> maybe that's less of an issue?

Has vma size ever been a real problem? Sure there might be a lot of
those but your patch increases it by rwsem (without the last patch)
which is something like 40B on top of 136B vma so we are talking about
400B in total which even with wild mapcount limits shouldn't really be
prohibitive. With a default map count limit we are talking about 2M
increase at most (per address space).

Or are you aware of any specific usecases where vma size is a real
problem?

Ingo Molnar Jan. 11, 2023, 9:54 a.m. UTC | #4

* Michal Hocko <mhocko@suse.com> wrote:

> On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> > On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> > >
> > > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> > >
> > > >This configuration variable will be used to build the support for VMA
> > > >locking during page fault handling.
> > > >
> > > >This is enabled by default on supported architectures with SMP and MMU
> > > >set.
> > > >
> > > >The architecture support is needed since the page fault handler is called
> > > >from the architecture's page faulting code which needs modifications to
> > > >handle faults under VMA lock.
> > >
> > > I don't think that per-vma locking should be something that is user-configurable.
> > > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> > 
> > Thanks for the suggestion! I would be happy to make that change if
> > there are no objections. I think the only pushback might have been the
> > vma size increase but with the latest optimization in the last patch
> > maybe that's less of an issue?
> 
> Has vma size ever been a real problem? Sure there might be a lot of those 
> but your patch increases it by rwsem (without the last patch) which is 
> something like 40B on top of 136B vma so we are talking about 400B in 
> total which even with wild mapcount limits shouldn't really be 
> prohibitive. With a default map count limit we are talking about 2M 
> increase at most (per address space).
> 
> Or are you aware of any specific usecases where vma size is a real 
> problem?

40 bytes for the rwsem, plus the patch also adds a 32-bit sequence counter:

  + int vm_lock_seq;
  + struct rw_semaphore lock;

So it's +44 bytes.

Thanks,

	Ingo

David Laight Jan. 11, 2023, 10:02 a.m. UTC | #5

From: Ingo Molnar
> Sent: 11 January 2023 09:54
> 
> * Michal Hocko <mhocko@suse.com> wrote:
> 
> > On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> > > On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> > > >
> > > > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> > > >
> > > > >This configuration variable will be used to build the support for VMA
> > > > >locking during page fault handling.
> > > > >
> > > > >This is enabled by default on supported architectures with SMP and MMU
> > > > >set.
> > > > >
> > > > >The architecture support is needed since the page fault handler is called
> > > > >from the architecture's page faulting code which needs modifications to
> > > > >handle faults under VMA lock.
> > > >
> > > > I don't think that per-vma locking should be something that is user-configurable.
> > > > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> > >
> > > Thanks for the suggestion! I would be happy to make that change if
> > > there are no objections. I think the only pushback might have been the
> > > vma size increase but with the latest optimization in the last patch
> > > maybe that's less of an issue?
> >
> > Has vma size ever been a real problem? Sure there might be a lot of those
> > but your patch increases it by rwsem (without the last patch) which is
> > something like 40B on top of 136B vma so we are talking about 400B in
> > total which even with wild mapcount limits shouldn't really be
> > prohibitive. With a default map count limit we are talking about 2M
> > increase at most (per address space).
> >
> > Or are you aware of any specific usecases where vma size is a real
> > problem?
> 
> 40 bytes for the rwsem, plus the patch also adds a 32-bit sequence counter:
> 
>   + int vm_lock_seq;
>   + struct rw_semaphore lock;
> 
> So it's +44 bytes.

Depend in whether vm_lock_seq goes into a padding hole or not
it will be 40 or 48 bytes.

But if these structures are allocated individually (not an array)
then it depends on how may items kmalloc() fits into a page (or 2,4).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Suren Baghdasaryan Jan. 11, 2023, 4:28 p.m. UTC | #6

On Wed, Jan 11, 2023 at 2:03 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Ingo Molnar
> > Sent: 11 January 2023 09:54
> >
> > * Michal Hocko <mhocko@suse.com> wrote:
> >
> > > On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> > > > On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> > > > >
> > > > > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> > > > >
> > > > > >This configuration variable will be used to build the support for VMA
> > > > > >locking during page fault handling.
> > > > > >
> > > > > >This is enabled by default on supported architectures with SMP and MMU
> > > > > >set.
> > > > > >
> > > > > >The architecture support is needed since the page fault handler is called
> > > > > >from the architecture's page faulting code which needs modifications to
> > > > > >handle faults under VMA lock.
> > > > >
> > > > > I don't think that per-vma locking should be something that is user-configurable.
> > > > > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> > > >
> > > > Thanks for the suggestion! I would be happy to make that change if
> > > > there are no objections. I think the only pushback might have been the
> > > > vma size increase but with the latest optimization in the last patch
> > > > maybe that's less of an issue?
> > >
> > > Has vma size ever been a real problem? Sure there might be a lot of those
> > > but your patch increases it by rwsem (without the last patch) which is
> > > something like 40B on top of 136B vma so we are talking about 400B in
> > > total which even with wild mapcount limits shouldn't really be
> > > prohibitive. With a default map count limit we are talking about 2M
> > > increase at most (per address space).
> > >
> > > Or are you aware of any specific usecases where vma size is a real
> > > problem?

Well, when fixing the cacheline bouncing problem in the initial design
I was adding 44 bytes to 152-byte vm_area_struct (CONFIG_NUMA enabled)
and pushing it just above 192 bytes while allocating these structures
from cache-aligned slab (keeping the lock in a separate cacheline to
prevent cacheline bouncing). That would use the whole 256 bytes per
VMA and it did make me nervous. The current design with no need to
cache-align vm_area_structs and with 44-byte overhead trimmed down to
16 bytes seems much more palatable.

> >
> > 40 bytes for the rwsem, plus the patch also adds a 32-bit sequence counter:
> >
> >   + int vm_lock_seq;
> >   + struct rw_semaphore lock;
> >
> > So it's +44 bytes.

Correct.

>
> Depend in whether vm_lock_seq goes into a padding hole or not
> it will be 40 or 48 bytes.
>
> But if these structures are allocated individually (not an array)
> then it depends on how may items kmalloc() fits into a page (or 2,4).

Yep. Depends on how we arrange the fields.

Anyhow. Sounds like the overhead of the current design is small enough
to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
support?
Thanks,
Suren.

>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

Michal Hocko Jan. 11, 2023, 4:44 p.m. UTC | #7

On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
[...]
> Anyhow. Sounds like the overhead of the current design is small enough
> to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> support?

Yes. Further optimizations can be done on top. Let's not over optimize
at this stage.

Suren Baghdasaryan Jan. 11, 2023, 5:04 p.m. UTC | #8

On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> [...]
> > Anyhow. Sounds like the overhead of the current design is small enough
> > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > support?
>
> Yes. Further optimizations can be done on top. Let's not over optimize
> at this stage.

Sure, I won't optimize any further.
Just to expand on your question. Original design would be problematic
for embedded systems like Android. It notoriously has a high number of
VMAs due to anonymous VMAs being named, which prevents them from
merging. 2M per process increase would raise questions, therefore I
felt the need for optimizing the memory overhead which is done in the
last patch.
Thanks for the feedback!

> --
> Michal Hocko
> SUSE Labs

Michal Hocko Jan. 11, 2023, 5:37 p.m. UTC | #9

On Wed 11-01-23 09:04:41, Suren Baghdasaryan wrote:
> On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> > [...]
> > > Anyhow. Sounds like the overhead of the current design is small enough
> > > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > > support?
> >
> > Yes. Further optimizations can be done on top. Let's not over optimize
> > at this stage.
> 
> Sure, I won't optimize any further.
> Just to expand on your question. Original design would be problematic
> for embedded systems like Android. It notoriously has a high number of
> VMAs due to anonymous VMAs being named, which prevents them from
> merging.

What is the usual number of VMAs in that environment?

Suren Baghdasaryan Jan. 11, 2023, 5:49 p.m. UTC | #10

On Wed, Jan 11, 2023 at 9:37 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 11-01-23 09:04:41, Suren Baghdasaryan wrote:
> > On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> > > [...]
> > > > Anyhow. Sounds like the overhead of the current design is small enough
> > > > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > > > support?
> > >
> > > Yes. Further optimizations can be done on top. Let's not over optimize
> > > at this stage.
> >
> > Sure, I won't optimize any further.
> > Just to expand on your question. Original design would be problematic
> > for embedded systems like Android. It notoriously has a high number of
> > VMAs due to anonymous VMAs being named, which prevents them from
> > merging.
>
> What is the usual number of VMAs in that environment?

I've seen some games which had over 4000 VMAs but that's on the upper
side. In my calculations I used 40000 VMAs as a ballpark number and
rough calculations before size optimization would increase memory
consumption by ~2M (depending on the lock placement in vm_area_struct
it would vary a bit). In Android, the performance team flags any
change that exceeds 500KB, so it would raise questions.

>
> --
> Michal Hocko
> SUSE Labs

Michal Hocko Jan. 11, 2023, 6:02 p.m. UTC | #11

On Wed 11-01-23 09:49:08, Suren Baghdasaryan wrote:
> On Wed, Jan 11, 2023 at 9:37 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 11-01-23 09:04:41, Suren Baghdasaryan wrote:
> > > On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> > > > [...]
> > > > > Anyhow. Sounds like the overhead of the current design is small enough
> > > > > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > > > > support?
> > > >
> > > > Yes. Further optimizations can be done on top. Let's not over optimize
> > > > at this stage.
> > >
> > > Sure, I won't optimize any further.
> > > Just to expand on your question. Original design would be problematic
> > > for embedded systems like Android. It notoriously has a high number of
> > > VMAs due to anonymous VMAs being named, which prevents them from
> > > merging.
> >
> > What is the usual number of VMAs in that environment?
> 
> I've seen some games which had over 4000 VMAs but that's on the upper
> side. In my calculations I used 40000 VMAs as a ballpark number and
> rough calculations before size optimization would increase memory
> consumption by ~2M (depending on the lock placement in vm_area_struct
> it would vary a bit). In Android, the performance team flags any
> change that exceeds 500KB, so it would raise questions.

Thanks, that is a useful information! This is just slightly off-topic
but I ak wondering how much memory those vma names consume. Are there
that many unique names or they just happen to be alternating so that
neighboring ones tend to be different.

Suren Baghdasaryan Jan. 11, 2023, 6:09 p.m. UTC | #12

On Wed, Jan 11, 2023 at 10:03 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 11-01-23 09:49:08, Suren Baghdasaryan wrote:
> > On Wed, Jan 11, 2023 at 9:37 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 11-01-23 09:04:41, Suren Baghdasaryan wrote:
> > > > On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> > > > > [...]
> > > > > > Anyhow. Sounds like the overhead of the current design is small enough
> > > > > > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > > > > > support?
> > > > >
> > > > > Yes. Further optimizations can be done on top. Let's not over optimize
> > > > > at this stage.
> > > >
> > > > Sure, I won't optimize any further.
> > > > Just to expand on your question. Original design would be problematic
> > > > for embedded systems like Android. It notoriously has a high number of
> > > > VMAs due to anonymous VMAs being named, which prevents them from
> > > > merging.
> > >
> > > What is the usual number of VMAs in that environment?
> >
> > I've seen some games which had over 4000 VMAs but that's on the upper
> > side. In my calculations I used 40000 VMAs as a ballpark number and
> > rough calculations before size optimization would increase memory
> > consumption by ~2M (depending on the lock placement in vm_area_struct
> > it would vary a bit). In Android, the performance team flags any
> > change that exceeds 500KB, so it would raise questions.
>
> Thanks, that is a useful information! This is just slightly off-topic
> but I ak wondering how much memory those vma names consume. Are there
> that many unique names or they just happen to be alternating so that
> neighboring ones tend to be different.

Good question. I don't have the ready answer to that but will try to
collect some stats. I know that many names are standardized but
haven't looked at how they are distributed in the address space. Will
followup once I collect the data.
Thanks,
Suren.

> --
> Michal Hocko
> SUSE Labs

[08/41] mm: introduce CONFIG_PER_VMA_LOCK

Commit Message

Comments

Patch