Message ID | 20210827191858.2037087-4-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Anonymous VMA naming patches | expand |
On Fri, Aug 27, 2021 at 12:18:58PM -0700, Suren Baghdasaryan wrote: > While forking a process with high number (64K) of named anonymous vmas the > overhead caused by strdup() is noticeable. Experiments with ARM64 Android > device show up to 40% performance regression when forking a process with > 64k unpopulated anonymous vmas using the max name lengths vs the same > process with the same number of anonymous vmas having no name. > Introduce anon_vma_name refcounted structure to avoid the overhead of > copying vma names during fork() and when splitting named anonymous vmas. > When a vma is duplicated, instead of copying the name we increment the > refcount of this structure. Multiple vmas can point to the same > anon_vma_name as long as they increment the refcount. The name member of > anon_vma_name structure is assigned at structure allocation time and is > never changed. If vma name changes then the refcount of the original > structure is dropped, a new anon_vma_name structure is allocated > to hold the new name and the vma pointer is updated to point to the new > structure. > With this approach the fork() performance regressions is reduced 3-4x > times and with usecases using more reasonable number of VMAs (a few > thousand) the regressions is not measurable. I like the refcounting; thank you! Since patch2 adds a lot of things that are changed by patch3; maybe combine them?
On Fri, Aug 27, 2021 at 10:28 PM Kees Cook <keescook@chromium.org> wrote: > > On Fri, Aug 27, 2021 at 12:18:58PM -0700, Suren Baghdasaryan wrote: > > While forking a process with high number (64K) of named anonymous vmas the > > overhead caused by strdup() is noticeable. Experiments with ARM64 Android > > device show up to 40% performance regression when forking a process with > > 64k unpopulated anonymous vmas using the max name lengths vs the same > > process with the same number of anonymous vmas having no name. > > Introduce anon_vma_name refcounted structure to avoid the overhead of > > copying vma names during fork() and when splitting named anonymous vmas. > > When a vma is duplicated, instead of copying the name we increment the > > refcount of this structure. Multiple vmas can point to the same > > anon_vma_name as long as they increment the refcount. The name member of > > anon_vma_name structure is assigned at structure allocation time and is > > never changed. If vma name changes then the refcount of the original > > structure is dropped, a new anon_vma_name structure is allocated > > to hold the new name and the vma pointer is updated to point to the new > > structure. > > With this approach the fork() performance regressions is reduced 3-4x > > times and with usecases using more reasonable number of VMAs (a few > > thousand) the regressions is not measurable. > > I like the refcounting; thank you! > > Since patch2 adds a lot of things that are changed by patch3; maybe > combine them? I thought it would be easier to review with the main logic being written using a basic type (string) first and then replace the basic type with a more complex refcounted structure. Also, if someone would like to rerun the tests and measure the regression of strdup vs refcounting approach, keeping this patch separate makes it easier to set up these tests. If that's not convenient I can absolutely squash them together. > > -- > Kees Cook
Am Freitag, 27. August 2021, 21:18:58 CEST schrieb Suren Baghdasaryan: > While forking a process with high number (64K) of named anonymous vmas the > overhead caused by strdup() is noticeable. Experiments with ARM64 Android > device show up to 40% performance regression when forking a process with > 64k unpopulated anonymous vmas using the max name lengths vs the same > process with the same number of anonymous vmas having no name. > Introduce anon_vma_name refcounted structure to avoid the overhead of > copying vma names during fork() and when splitting named anonymous vmas. > When a vma is duplicated, instead of copying the name we increment the > refcount of this structure. Multiple vmas can point to the same > anon_vma_name as long as they increment the refcount. The name member of > anon_vma_name structure is assigned at structure allocation time and is > never changed. If vma name changes then the refcount of the original > structure is dropped, a new anon_vma_name structure is allocated > to hold the new name and the vma pointer is updated to point to the new > structure. > With this approach the fork() performance regressions is reduced 3-4x > times and with usecases using more reasonable number of VMAs (a few > thousand) the regressions is not measurable. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > include/linux/mm_types.h | 9 ++++++++- > mm/madvise.c | 42 +++++++++++++++++++++++++++++++++------- > 2 files changed, 43 insertions(+), 8 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index bc029f3fca6a..32ac5dc5ebf3 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -63,6 +63,27 @@ static int madvise_need_mmap_write(int behavior) > } > } > > +static struct anon_vma_name *anon_vma_name_alloc(const char *name) > +{ > + struct anon_vma_name *anon_name; > + size_t len = strlen(name); > + > + /* Add 1 for NUL terminator at the end of the anon_name->name */ > + anon_name = kzalloc(sizeof(*anon_name) + len + 1, > + GFP_KERNEL); > + kref_init(&anon_name->kref); > + strcpy(anon_name->name, name); > + > + return anon_name; > +} Given that you overwrite anything in that struct anyway this could be reduced to kmalloc(), no? And it definitely needs a NULL check. Eike
On Mon, Aug 30, 2021 at 12:03 AM Rolf Eike Beer <eb@emlix.com> wrote: > > Am Freitag, 27. August 2021, 21:18:58 CEST schrieb Suren Baghdasaryan: > > While forking a process with high number (64K) of named anonymous vmas the > > overhead caused by strdup() is noticeable. Experiments with ARM64 Android > > device show up to 40% performance regression when forking a process with > > 64k unpopulated anonymous vmas using the max name lengths vs the same > > process with the same number of anonymous vmas having no name. > > Introduce anon_vma_name refcounted structure to avoid the overhead of > > copying vma names during fork() and when splitting named anonymous vmas. > > When a vma is duplicated, instead of copying the name we increment the > > refcount of this structure. Multiple vmas can point to the same > > anon_vma_name as long as they increment the refcount. The name member of > > anon_vma_name structure is assigned at structure allocation time and is > > never changed. If vma name changes then the refcount of the original > > structure is dropped, a new anon_vma_name structure is allocated > > to hold the new name and the vma pointer is updated to point to the new > > structure. > > With this approach the fork() performance regressions is reduced 3-4x > > times and with usecases using more reasonable number of VMAs (a few > > thousand) the regressions is not measurable. > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > include/linux/mm_types.h | 9 ++++++++- > > mm/madvise.c | 42 +++++++++++++++++++++++++++++++++------- > > 2 files changed, 43 insertions(+), 8 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index bc029f3fca6a..32ac5dc5ebf3 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -63,6 +63,27 @@ static int madvise_need_mmap_write(int behavior) > > } > > } > > > > +static struct anon_vma_name *anon_vma_name_alloc(const char *name) > > +{ > > + struct anon_vma_name *anon_name; > > + size_t len = strlen(name); > > + > > + /* Add 1 for NUL terminator at the end of the anon_name->name */ > > + anon_name = kzalloc(sizeof(*anon_name) + len + 1, > > + GFP_KERNEL); > > + kref_init(&anon_name->kref); > > + strcpy(anon_name->name, name); > > + > > + return anon_name; > > +} > > Given that you overwrite anything in that struct anyway this could be reduced > to kmalloc(), no? And it definitely needs a NULL check. Ack. I'll address both points in the next revision. Thanks! Suren. > > Eike > -- > Rolf Eike Beer, emlix GmbH, http://www.emlix.com > Fon +49 551 30664-0, Fax +49 551 30664-11 > Gothaer Platz 3, 37083 Göttingen, Germany > Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160 > Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055 > > emlix - smart embedded open source
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 26a30f7a5228..a7361acf2921 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -5,6 +5,7 @@ #include <linux/mm_types_task.h> #include <linux/auxvec.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/rbtree.h> @@ -302,6 +303,12 @@ struct vm_userfaultfd_ctx { struct vm_userfaultfd_ctx {}; #endif /* CONFIG_USERFAULTFD */ +struct anon_vma_name { + struct kref kref; + /* The name needs to be at the end because it is dynamically sized. */ + char name[]; +}; + /* * This struct describes a virtual memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory @@ -353,7 +360,7 @@ struct vm_area_struct { unsigned long rb_subtree_last; } shared; /* Serialized by mmap_sem. */ - char *anon_name; + struct anon_vma_name *anon_name; }; /* diff --git a/mm/madvise.c b/mm/madvise.c index bc029f3fca6a..32ac5dc5ebf3 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -63,6 +63,27 @@ static int madvise_need_mmap_write(int behavior) } } +static struct anon_vma_name *anon_vma_name_alloc(const char *name) +{ + struct anon_vma_name *anon_name; + size_t len = strlen(name); + + /* Add 1 for NUL terminator at the end of the anon_name->name */ + anon_name = kzalloc(sizeof(*anon_name) + len + 1, + GFP_KERNEL); + kref_init(&anon_name->kref); + strcpy(anon_name->name, name); + + return anon_name; +} + +static void vma_anon_name_free(struct kref *kref) +{ + struct anon_vma_name *anon_name = + container_of(kref, struct anon_vma_name, kref); + kfree(anon_name); +} + static inline bool has_vma_anon_name(struct vm_area_struct *vma) { return !vma->vm_file && vma->anon_name; @@ -75,7 +96,7 @@ const char *vma_anon_name(struct vm_area_struct *vma) mmap_assert_locked(vma->vm_mm); - return vma->anon_name; + return vma->anon_name->name; } void dup_vma_anon_name(struct vm_area_struct *orig_vma, @@ -84,37 +105,44 @@ void dup_vma_anon_name(struct vm_area_struct *orig_vma, if (!has_vma_anon_name(orig_vma)) return; - new_vma->anon_name = kstrdup(orig_vma->anon_name, GFP_KERNEL); + kref_get(&orig_vma->anon_name->kref); + new_vma->anon_name = orig_vma->anon_name; } void free_vma_anon_name(struct vm_area_struct *vma) { + struct anon_vma_name *anon_name; + if (!has_vma_anon_name(vma)) return; - kfree(vma->anon_name); + anon_name = vma->anon_name; vma->anon_name = NULL; + kref_put(&anon_name->kref, vma_anon_name_free); } /* mmap_lock should be write-locked */ static void replace_vma_anon_name(struct vm_area_struct *vma, const char *name) { + const char *anon_name; + if (!name) { free_vma_anon_name(vma); return; } - if (vma->anon_name) { + anon_name = vma_anon_name(vma); + if (anon_name) { /* Should never happen, to dup use dup_vma_anon_name() */ - WARN_ON(vma->anon_name == name); + WARN_ON(anon_name == name); /* Same name, nothing to do here */ - if (!strcmp(name, vma->anon_name)) + if (!strcmp(name, anon_name)) return; free_vma_anon_name(vma); } - vma->anon_name = kstrdup(name, GFP_KERNEL); + vma->anon_name = anon_vma_name_alloc(name); } /*
While forking a process with high number (64K) of named anonymous vmas the overhead caused by strdup() is noticeable. Experiments with ARM64 Android device show up to 40% performance regression when forking a process with 64k unpopulated anonymous vmas using the max name lengths vs the same process with the same number of anonymous vmas having no name. Introduce anon_vma_name refcounted structure to avoid the overhead of copying vma names during fork() and when splitting named anonymous vmas. When a vma is duplicated, instead of copying the name we increment the refcount of this structure. Multiple vmas can point to the same anon_vma_name as long as they increment the refcount. The name member of anon_vma_name structure is assigned at structure allocation time and is never changed. If vma name changes then the refcount of the original structure is dropped, a new anon_vma_name structure is allocated to hold the new name and the vma pointer is updated to point to the new structure. With this approach the fork() performance regressions is reduced 3-4x times and with usecases using more reasonable number of VMAs (a few thousand) the regressions is not measurable. Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- include/linux/mm_types.h | 9 ++++++++- mm/madvise.c | 42 +++++++++++++++++++++++++++++++++------- 2 files changed, 43 insertions(+), 8 deletions(-)