Message ID | 20190612142811.24894-1-mkoutny@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] binfmt_elf: Protect mm_struct access with mmap_sem | expand |
On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote: > - /* N.B. passed_fileno might not be initialized? */ > + Why did you delete this comment?
On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <willy@infradead.org> wrote: > On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote: > > - /* N.B. passed_fileno might not be initialized? */ > > + > > Why did you delete this comment? The variable got removed in d20894a23708 ("Remove a.out interpreter support in ELF loader") so it is not relevant anymore.
On Wed, Jun 12, 2019 at 07:29:15PM +0200, Michal Koutný wrote: > On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote: > > > - /* N.B. passed_fileno might not be initialized? */ > > > + > > > > Why did you delete this comment? > The variable got removed in > d20894a23708 ("Remove a.out interpreter support in ELF loader") > so it is not relevant anymore. Better put that in the changelog for v2 then. or even make it a separate patch.
On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote: > find_extend_vma assumes the caller holds mmap_sem as a reader (explained > in expand_downwards()). The path when we are extending the stack VMA to > accomodate argv[] pointers happens without the lock. > > I was not able to cause an mm_struct corruption but > BUG_ON(!rwsem_is_locked(&mm->mmap_sem)) in find_extend_vma could be > triggered as > > # <bigfile xargs echo > xargs: echo: terminated by signal 11 > > (bigfile needs to have more than RLIMIT_STACK / sizeof(char *) rows) > > Other accesses to mm_struct in exec path are protected by mmap_sem, so > conservatively, protect also this one. Besides that, explain why we omit > mm_struct.arg_lock in the exec(2) path. > > Cc: Cyrill Gorcunov <gorcunov@gmail.com> > Signed-off-by: Michal Koutný <mkoutny@suse.com> > --- > > When I was attempting to reduce usage of mmap_sem I came across this > unprotected access and increased number of its holders :-/ > > I'm not sure whether there is a real concurrent writer at this early > stages (I considered khugepaged especially as setup_arg_pages invokes > khugepaged_enter_vma_merge but we're lucky because khugepaged skips it > because of VM_STACK_INCOMPLETE_SETUP). > > A nicer approach would perhaps be to do all this exec setup when the > mm_struct is still not exposed via current->mm (and hence no need to > synchronize via mmap_sem). But I didn't look enough into binfmt specific > whether it is even doable and worth it. > > So I'm sending this for a discussion. > > fs/binfmt_elf.c | 10 +++++++++- > fs/exec.c | 3 ++- > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > index 8264b468f283..48e169760a9c 100644 > --- a/fs/binfmt_elf.c > +++ b/fs/binfmt_elf.c > @@ -299,7 +299,11 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, > * Grow the stack manually; some architectures have a limit on how > * far ahead a user-space access may be in order to grow the stack. > */ > + if (down_read_killable(¤t->mm->mmap_sem)) > + return -EINTR; > vma = find_extend_vma(current->mm, bprm->p); > + up_read(¤t->mm->mmap_sem); > + Good catch, Michal! Actually the loader code is heavy on its own so I think having readlock taken here should not cause any perf problems but worth having for consistency. Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
On Wed, Jun 12, 2019 at 10:51:59AM -0700, Matthew Wilcox wrote: > On Wed, Jun 12, 2019 at 07:29:15PM +0200, Michal Koutný wrote: > > On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <willy@infradead.org> wrote: > > > On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote: > > > > - /* N.B. passed_fileno might not be initialized? */ > > > > + > > > > > > Why did you delete this comment? > > The variable got removed in > > d20894a23708 ("Remove a.out interpreter support in ELF loader") > > so it is not relevant anymore. > > Better put that in the changelog for v2 then. or even make it a > separate patch. Just updated changelog should be fine, I guess. A separate commit just to remove an obsolete comment is too much.
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 8264b468f283..48e169760a9c 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -299,7 +299,11 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec, * Grow the stack manually; some architectures have a limit on how * far ahead a user-space access may be in order to grow the stack. */ + if (down_read_killable(¤t->mm->mmap_sem)) + return -EINTR; vma = find_extend_vma(current->mm, bprm->p); + up_read(¤t->mm->mmap_sem); + if (!vma) return -EFAULT; @@ -1123,11 +1127,15 @@ static int load_elf_binary(struct linux_binprm *bprm) goto out; #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */ + /* + * Don't take mm->arg_lock. The concurrent change might happen only + * from prctl_set_mm but after de_thread we are certainly alone here. + */ retval = create_elf_tables(bprm, &loc->elf_ex, load_addr, interp_load_addr); if (retval < 0) goto out; - /* N.B. passed_fileno might not be initialized? */ + current->mm->end_code = end_code; current->mm->start_code = start_code; current->mm->start_data = start_data; diff --git a/fs/exec.c b/fs/exec.c index 89a500bb897a..d5b55c92019a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -212,7 +212,8 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos, /* * We are doing an exec(). 'current' is the process - * doing the exec and bprm->mm is the new process's mm. + * doing the exec and bprm->mm is the new process's mm that is not + * shared yet, so no synchronization on mmap_sem. */ ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags, &page, NULL, NULL);
find_extend_vma assumes the caller holds mmap_sem as a reader (explained in expand_downwards()). The path when we are extending the stack VMA to accomodate argv[] pointers happens without the lock. I was not able to cause an mm_struct corruption but BUG_ON(!rwsem_is_locked(&mm->mmap_sem)) in find_extend_vma could be triggered as # <bigfile xargs echo xargs: echo: terminated by signal 11 (bigfile needs to have more than RLIMIT_STACK / sizeof(char *) rows) Other accesses to mm_struct in exec path are protected by mmap_sem, so conservatively, protect also this one. Besides that, explain why we omit mm_struct.arg_lock in the exec(2) path. Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Michal Koutný <mkoutny@suse.com> --- When I was attempting to reduce usage of mmap_sem I came across this unprotected access and increased number of its holders :-/ I'm not sure whether there is a real concurrent writer at this early stages (I considered khugepaged especially as setup_arg_pages invokes khugepaged_enter_vma_merge but we're lucky because khugepaged skips it because of VM_STACK_INCOMPLETE_SETUP). A nicer approach would perhaps be to do all this exec setup when the mm_struct is still not exposed via current->mm (and hence no need to synchronize via mmap_sem). But I didn't look enough into binfmt specific whether it is even doable and worth it. So I'm sending this for a discussion. fs/binfmt_elf.c | 10 +++++++++- fs/exec.c | 3 ++- 2 files changed, 11 insertions(+), 2 deletions(-)