diff mbox series

[08/41] mm: introduce CONFIG_PER_VMA_LOCK

Message ID 20230109205336.3665937-9-surenb@google.com (mailing list archive)
State New
Headers show
Series Per-VMA locks | expand

Commit Message

Suren Baghdasaryan Jan. 9, 2023, 8:53 p.m. UTC
This configuration variable will be used to build the support for VMA
locking during page fault handling.

This is enabled by default on supported architectures with SMP and MMU
set.

The architecture support is needed since the page fault handler is called
from the architecture's page faulting code which needs modifications to
handle faults under VMA lock.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/Kconfig | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Davidlohr Bueso Jan. 11, 2023, 12:13 a.m. UTC | #1
On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:

>This configuration variable will be used to build the support for VMA
>locking during page fault handling.
>
>This is enabled by default on supported architectures with SMP and MMU
>set.
>
>The architecture support is needed since the page fault handler is called
>from the architecture's page faulting code which needs modifications to
>handle faults under VMA lock.

I don't think that per-vma locking should be something that is user-configurable.
It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?

Thanks,
Davidlohr
Suren Baghdasaryan Jan. 11, 2023, 12:44 a.m. UTC | #2
On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
>
> On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
>
> >This configuration variable will be used to build the support for VMA
> >locking during page fault handling.
> >
> >This is enabled by default on supported architectures with SMP and MMU
> >set.
> >
> >The architecture support is needed since the page fault handler is called
> >from the architecture's page faulting code which needs modifications to
> >handle faults under VMA lock.
>
> I don't think that per-vma locking should be something that is user-configurable.
> It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?

Thanks for the suggestion! I would be happy to make that change if
there are no objections. I think the only pushback might have been the
vma size increase but with the latest optimization in the last patch
maybe that's less of an issue?

>
> Thanks,
> Davidlohr
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
Michal Hocko Jan. 11, 2023, 8:23 a.m. UTC | #3
On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> >
> > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> >
> > >This configuration variable will be used to build the support for VMA
> > >locking during page fault handling.
> > >
> > >This is enabled by default on supported architectures with SMP and MMU
> > >set.
> > >
> > >The architecture support is needed since the page fault handler is called
> > >from the architecture's page faulting code which needs modifications to
> > >handle faults under VMA lock.
> >
> > I don't think that per-vma locking should be something that is user-configurable.
> > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> 
> Thanks for the suggestion! I would be happy to make that change if
> there are no objections. I think the only pushback might have been the
> vma size increase but with the latest optimization in the last patch
> maybe that's less of an issue?

Has vma size ever been a real problem? Sure there might be a lot of
those but your patch increases it by rwsem (without the last patch)
which is something like 40B on top of 136B vma so we are talking about
400B in total which even with wild mapcount limits shouldn't really be
prohibitive. With a default map count limit we are talking about 2M
increase at most (per address space).

Or are you aware of any specific usecases where vma size is a real
problem?
Ingo Molnar Jan. 11, 2023, 9:54 a.m. UTC | #4
* Michal Hocko <mhocko@suse.com> wrote:

> On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> > On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> > >
> > > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> > >
> > > >This configuration variable will be used to build the support for VMA
> > > >locking during page fault handling.
> > > >
> > > >This is enabled by default on supported architectures with SMP and MMU
> > > >set.
> > > >
> > > >The architecture support is needed since the page fault handler is called
> > > >from the architecture's page faulting code which needs modifications to
> > > >handle faults under VMA lock.
> > >
> > > I don't think that per-vma locking should be something that is user-configurable.
> > > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> > 
> > Thanks for the suggestion! I would be happy to make that change if
> > there are no objections. I think the only pushback might have been the
> > vma size increase but with the latest optimization in the last patch
> > maybe that's less of an issue?
> 
> Has vma size ever been a real problem? Sure there might be a lot of those 
> but your patch increases it by rwsem (without the last patch) which is 
> something like 40B on top of 136B vma so we are talking about 400B in 
> total which even with wild mapcount limits shouldn't really be 
> prohibitive. With a default map count limit we are talking about 2M 
> increase at most (per address space).
> 
> Or are you aware of any specific usecases where vma size is a real 
> problem?

40 bytes for the rwsem, plus the patch also adds a 32-bit sequence counter:

  + int vm_lock_seq;
  + struct rw_semaphore lock;

So it's +44 bytes.

Thanks,

	Ingo
David Laight Jan. 11, 2023, 10:02 a.m. UTC | #5
From: Ingo Molnar
> Sent: 11 January 2023 09:54
> 
> * Michal Hocko <mhocko@suse.com> wrote:
> 
> > On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> > > On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> > > >
> > > > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> > > >
> > > > >This configuration variable will be used to build the support for VMA
> > > > >locking during page fault handling.
> > > > >
> > > > >This is enabled by default on supported architectures with SMP and MMU
> > > > >set.
> > > > >
> > > > >The architecture support is needed since the page fault handler is called
> > > > >from the architecture's page faulting code which needs modifications to
> > > > >handle faults under VMA lock.
> > > >
> > > > I don't think that per-vma locking should be something that is user-configurable.
> > > > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> > >
> > > Thanks for the suggestion! I would be happy to make that change if
> > > there are no objections. I think the only pushback might have been the
> > > vma size increase but with the latest optimization in the last patch
> > > maybe that's less of an issue?
> >
> > Has vma size ever been a real problem? Sure there might be a lot of those
> > but your patch increases it by rwsem (without the last patch) which is
> > something like 40B on top of 136B vma so we are talking about 400B in
> > total which even with wild mapcount limits shouldn't really be
> > prohibitive. With a default map count limit we are talking about 2M
> > increase at most (per address space).
> >
> > Or are you aware of any specific usecases where vma size is a real
> > problem?
> 
> 40 bytes for the rwsem, plus the patch also adds a 32-bit sequence counter:
> 
>   + int vm_lock_seq;
>   + struct rw_semaphore lock;
> 
> So it's +44 bytes.

Depend in whether vm_lock_seq goes into a padding hole or not
it will be 40 or 48 bytes.

But if these structures are allocated individually (not an array)
then it depends on how may items kmalloc() fits into a page (or 2,4).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Suren Baghdasaryan Jan. 11, 2023, 4:28 p.m. UTC | #6
On Wed, Jan 11, 2023 at 2:03 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Ingo Molnar
> > Sent: 11 January 2023 09:54
> >
> > * Michal Hocko <mhocko@suse.com> wrote:
> >
> > > On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> > > > On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> > > > >
> > > > > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> > > > >
> > > > > >This configuration variable will be used to build the support for VMA
> > > > > >locking during page fault handling.
> > > > > >
> > > > > >This is enabled by default on supported architectures with SMP and MMU
> > > > > >set.
> > > > > >
> > > > > >The architecture support is needed since the page fault handler is called
> > > > > >from the architecture's page faulting code which needs modifications to
> > > > > >handle faults under VMA lock.
> > > > >
> > > > > I don't think that per-vma locking should be something that is user-configurable.
> > > > > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> > > >
> > > > Thanks for the suggestion! I would be happy to make that change if
> > > > there are no objections. I think the only pushback might have been the
> > > > vma size increase but with the latest optimization in the last patch
> > > > maybe that's less of an issue?
> > >
> > > Has vma size ever been a real problem? Sure there might be a lot of those
> > > but your patch increases it by rwsem (without the last patch) which is
> > > something like 40B on top of 136B vma so we are talking about 400B in
> > > total which even with wild mapcount limits shouldn't really be
> > > prohibitive. With a default map count limit we are talking about 2M
> > > increase at most (per address space).
> > >
> > > Or are you aware of any specific usecases where vma size is a real
> > > problem?

Well, when fixing the cacheline bouncing problem in the initial design
I was adding 44 bytes to 152-byte vm_area_struct (CONFIG_NUMA enabled)
and pushing it just above 192 bytes while allocating these structures
from cache-aligned slab (keeping the lock in a separate cacheline to
prevent cacheline bouncing). That would use the whole 256 bytes per
VMA and it did make me nervous. The current design with no need to
cache-align vm_area_structs and with 44-byte overhead trimmed down to
16 bytes seems much more palatable.

> >
> > 40 bytes for the rwsem, plus the patch also adds a 32-bit sequence counter:
> >
> >   + int vm_lock_seq;
> >   + struct rw_semaphore lock;
> >
> > So it's +44 bytes.

Correct.

>
> Depend in whether vm_lock_seq goes into a padding hole or not
> it will be 40 or 48 bytes.
>
> But if these structures are allocated individually (not an array)
> then it depends on how may items kmalloc() fits into a page (or 2,4).

Yep. Depends on how we arrange the fields.

Anyhow. Sounds like the overhead of the current design is small enough
to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
support?
Thanks,
Suren.

>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
Michal Hocko Jan. 11, 2023, 4:44 p.m. UTC | #7
On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
[...]
> Anyhow. Sounds like the overhead of the current design is small enough
> to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> support?

Yes. Further optimizations can be done on top. Let's not over optimize
at this stage.
Suren Baghdasaryan Jan. 11, 2023, 5:04 p.m. UTC | #8
On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> [...]
> > Anyhow. Sounds like the overhead of the current design is small enough
> > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > support?
>
> Yes. Further optimizations can be done on top. Let's not over optimize
> at this stage.

Sure, I won't optimize any further.
Just to expand on your question. Original design would be problematic
for embedded systems like Android. It notoriously has a high number of
VMAs due to anonymous VMAs being named, which prevents them from
merging. 2M per process increase would raise questions, therefore I
felt the need for optimizing the memory overhead which is done in the
last patch.
Thanks for the feedback!

> --
> Michal Hocko
> SUSE Labs
Michal Hocko Jan. 11, 2023, 5:37 p.m. UTC | #9
On Wed 11-01-23 09:04:41, Suren Baghdasaryan wrote:
> On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> > [...]
> > > Anyhow. Sounds like the overhead of the current design is small enough
> > > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > > support?
> >
> > Yes. Further optimizations can be done on top. Let's not over optimize
> > at this stage.
> 
> Sure, I won't optimize any further.
> Just to expand on your question. Original design would be problematic
> for embedded systems like Android. It notoriously has a high number of
> VMAs due to anonymous VMAs being named, which prevents them from
> merging.

What is the usual number of VMAs in that environment?
Suren Baghdasaryan Jan. 11, 2023, 5:49 p.m. UTC | #10
On Wed, Jan 11, 2023 at 9:37 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 11-01-23 09:04:41, Suren Baghdasaryan wrote:
> > On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> > > [...]
> > > > Anyhow. Sounds like the overhead of the current design is small enough
> > > > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > > > support?
> > >
> > > Yes. Further optimizations can be done on top. Let's not over optimize
> > > at this stage.
> >
> > Sure, I won't optimize any further.
> > Just to expand on your question. Original design would be problematic
> > for embedded systems like Android. It notoriously has a high number of
> > VMAs due to anonymous VMAs being named, which prevents them from
> > merging.
>
> What is the usual number of VMAs in that environment?

I've seen some games which had over 4000 VMAs but that's on the upper
side. In my calculations I used 40000 VMAs as a ballpark number and
rough calculations before size optimization would increase memory
consumption by ~2M (depending on the lock placement in vm_area_struct
it would vary a bit). In Android, the performance team flags any
change that exceeds 500KB, so it would raise questions.

>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Jan. 11, 2023, 6:02 p.m. UTC | #11
On Wed 11-01-23 09:49:08, Suren Baghdasaryan wrote:
> On Wed, Jan 11, 2023 at 9:37 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 11-01-23 09:04:41, Suren Baghdasaryan wrote:
> > > On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> > > > [...]
> > > > > Anyhow. Sounds like the overhead of the current design is small enough
> > > > > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > > > > support?
> > > >
> > > > Yes. Further optimizations can be done on top. Let's not over optimize
> > > > at this stage.
> > >
> > > Sure, I won't optimize any further.
> > > Just to expand on your question. Original design would be problematic
> > > for embedded systems like Android. It notoriously has a high number of
> > > VMAs due to anonymous VMAs being named, which prevents them from
> > > merging.
> >
> > What is the usual number of VMAs in that environment?
> 
> I've seen some games which had over 4000 VMAs but that's on the upper
> side. In my calculations I used 40000 VMAs as a ballpark number and
> rough calculations before size optimization would increase memory
> consumption by ~2M (depending on the lock placement in vm_area_struct
> it would vary a bit). In Android, the performance team flags any
> change that exceeds 500KB, so it would raise questions.

Thanks, that is a useful information! This is just slightly off-topic
but I ak wondering how much memory those vma names consume. Are there
that many unique names or they just happen to be alternating so that
neighboring ones tend to be different.
Suren Baghdasaryan Jan. 11, 2023, 6:09 p.m. UTC | #12
On Wed, Jan 11, 2023 at 10:03 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 11-01-23 09:49:08, Suren Baghdasaryan wrote:
> > On Wed, Jan 11, 2023 at 9:37 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 11-01-23 09:04:41, Suren Baghdasaryan wrote:
> > > > On Wed, Jan 11, 2023 at 8:44 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 11-01-23 08:28:49, Suren Baghdasaryan wrote:
> > > > > [...]
> > > > > > Anyhow. Sounds like the overhead of the current design is small enough
> > > > > > to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
> > > > > > support?
> > > > >
> > > > > Yes. Further optimizations can be done on top. Let's not over optimize
> > > > > at this stage.
> > > >
> > > > Sure, I won't optimize any further.
> > > > Just to expand on your question. Original design would be problematic
> > > > for embedded systems like Android. It notoriously has a high number of
> > > > VMAs due to anonymous VMAs being named, which prevents them from
> > > > merging.
> > >
> > > What is the usual number of VMAs in that environment?
> >
> > I've seen some games which had over 4000 VMAs but that's on the upper
> > side. In my calculations I used 40000 VMAs as a ballpark number and
> > rough calculations before size optimization would increase memory
> > consumption by ~2M (depending on the lock placement in vm_area_struct
> > it would vary a bit). In Android, the performance team flags any
> > change that exceeds 500KB, so it would raise questions.
>
> Thanks, that is a useful information! This is just slightly off-topic
> but I ak wondering how much memory those vma names consume. Are there
> that many unique names or they just happen to be alternating so that
> neighboring ones tend to be different.

Good question. I don't have the ready answer to that but will try to
collect some stats. I know that many names are standardized but
haven't looked at how they are distributed in the address space. Will
followup once I collect the data.
Thanks,
Suren.

> --
> Michal Hocko
> SUSE Labs
diff mbox series

Patch

diff --git a/mm/Kconfig b/mm/Kconfig
index ff7b209dec05..0aeca3794972 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1183,6 +1183,19 @@  config LRU_GEN_STATS
 	  This option has a per-memcg and per-node memory overhead.
 # }
 
+config ARCH_SUPPORTS_PER_VMA_LOCK
+       def_bool n
+
+config PER_VMA_LOCK
+	bool "Per-vma locking support"
+	default y
+	depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
+	help
+	  Allow per-vma locking during page fault handling.
+
+	  This feature allows locking each virtual memory area separately when
+	  handling page faults instead of taking mmap_lock.
+
 source "mm/damon/Kconfig"
 
 endmenu