Message ID | YRPaodsBm3ambw8z@miu.piliscsaba.redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mmap denywrite mess (Was: [GIT PULL] overlayfs fixes for 5.14-rc6) | expand |
On 11.08.21 16:11, Miklos Szeredi wrote: > On Mon, Aug 09, 2021 at 02:25:17PM -0700, Linus Torvalds wrote: > >> Ugh. Th edances with denywrite and mapping_unmap_writable are really >> really annoying. > > Attached version has error and success paths separated. Was that your > complaint? > >> I get the feeling that the whole thing with deny_write_access and >> mapping_map_writable could possibly be done after-the-fact somehow as >> part of actually inserting the vma in the vma tree, rather than done >> as the vma is prepared. > > I don't know if that's doable or not. The final denywrite count is obtained in > __vma_link_file(), called after __vma_link(). The questions are: > > - does the order of those helper calls matter? > > - if it does, could the __vma_link() be safely undone after an unsuccessful > __vmal_link_file()? > >> And most users of vma_set_file() probably really don't want that whole >> thing at all (ie the DRM stuff that just switches out a local thing. >> They also don't check for the new error cases you've added. > > Christian König wants to follow up with those checks (which should be asserts, > if the code wasn't buggy in the first place). > >> So I really think this is quite questionable, and those cases should >> probably have been done entirely inside ovlfs rather than polluting >> the cases that don't care and don't check. > > I don't get that. mmap_region() currently drops the deny counts from the > original file. That doesn't work for overlayfs since it needs to take new temp > counts on the override file. > > So mmap_region() is changed to drop the counts on vma->vm_file, but then all > callers of vma_set_file() will need to do that switch of temp counts, there's no > way around that. > > Thanks, > Miklos > > For reference, here's the previous discussion: > > https://lore.kernel.org/linux-mm/YNHXzBgzRrZu1MrD@miu.piliscsaba.redhat.com/ > > --- > fs/overlayfs/file.c | 4 +++- > include/linux/mm.h | 2 +- > mm/mmap.c | 2 +- > mm/util.c | 31 ++++++++++++++++++++++++++++++- > 4 files changed, 35 insertions(+), 4 deletions(-) > > --- a/fs/overlayfs/file.c > +++ b/fs/overlayfs/file.c > @@ -475,7 +475,9 @@ static int ovl_mmap(struct file *file, s > if (WARN_ON(file != vma->vm_file)) > return -EIO; > > - vma_set_file(vma, realfile); > + ret = vma_set_file(vma, realfile); > + if (ret) > + return ret; > > old_cred = ovl_override_creds(file_inode(file)->i_sb); > ret = call_mmap(vma->vm_file, vma); > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2780,7 +2780,7 @@ static inline void vma_set_page_prot(str > } > #endif > > -void vma_set_file(struct vm_area_struct *vma, struct file *file); > +int /* __must_check */ vma_set_file(struct vm_area_struct *vma, struct file *file); > > #ifdef CONFIG_NUMA_BALANCING > unsigned long change_prot_numa(struct vm_area_struct *vma, > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1806,6 +1806,7 @@ unsigned long mmap_region(struct file *f > */ > vma->vm_file = get_file(file); > error = call_mmap(file, vma); > + file = vma->vm_file; > if (error) > goto unmap_and_free_vma; > > @@ -1867,7 +1868,6 @@ unsigned long mmap_region(struct file *f > if (vm_flags & VM_DENYWRITE) > allow_write_access(file); > } > - file = vma->vm_file; > out: > perf_event_mmap(vma); > > --- a/mm/util.c > +++ b/mm/util.c > @@ -314,12 +314,41 @@ int vma_is_stack_for_current(struct vm_a > /* > * Change backing file, only valid to use during initial VMA setup. > */ > -void vma_set_file(struct vm_area_struct *vma, struct file *file) > +int vma_set_file(struct vm_area_struct *vma, struct file *file) > { > + vm_flags_t vm_flags = vma->vm_flags; > + int err; > + > + /* Get temporary denial counts on replacement */ > + if (vm_flags & VM_DENYWRITE) { > + err = deny_write_access(file); > + if (err) > + return err; > + } > + if (vm_flags & VM_SHARED) { > + err = mapping_map_writable(file->f_mapping); > + if (err) > + goto undo_denywrite; > + } > + > /* Changing an anonymous vma with this is illegal */ > get_file(file); > swap(vma->vm_file, file); > + > + /* Undo temporary denial counts on replaced */ > + if (vm_flags & VM_SHARED) > + mapping_unmap_writable(file->f_mapping); > + > + if (vm_flags & VM_DENYWRITE) > + allow_write_access(file); > + > fput(file); > + return 0; > + > +undo_denywrite: > + if (vm_flags & VM_DENYWRITE) > + allow_write_access(file); > + return err; > } > EXPORT_SYMBOL(vma_set_file); > > I proposed a while ago to get rid of VM_DENYWRITE completely: https://lkml.kernel.org/r/20210423131640.20080-1-david@redhat.com I haven't looked how much it still applies to current upstream, but maybe that might help cleaning up that code.
On Wed, Aug 11, 2021 at 4:45 AM David Hildenbrand <david@redhat.com> wrote: > > I proposed a while ago to get rid of VM_DENYWRITE completely: > > https://lkml.kernel.org/r/20210423131640.20080-1-david@redhat.com > > I haven't looked how much it still applies to current upstream, but > maybe that might help cleaning up that code. I like it. I agree that we could - and probably should - just do it this way. We don't expose MAP_DENYWRITE to user space any more - and the old legacy library loading code certainly isn't worth it - and so effectively the only way to set it is with execve(). And yes, it gets rid of all the silly games with the per-mapping flags. Linus
On 11.08.21 18:20, Linus Torvalds wrote: > On Wed, Aug 11, 2021 at 4:45 AM David Hildenbrand <david@redhat.com> wrote: >> >> I proposed a while ago to get rid of VM_DENYWRITE completely: >> >> https://lkml.kernel.org/r/20210423131640.20080-1-david@redhat.com >> >> I haven't looked how much it still applies to current upstream, but >> maybe that might help cleaning up that code. > > I like it. > > I agree that we could - and probably should - just do it this way. > > We don't expose MAP_DENYWRITE to user space any more - and the old > legacy library loading code certainly isn't worth it - and so > effectively the only way to set it is with execve(). > > And yes, it gets rid of all the silly games with the per-mapping flags. I'll rebase, retest and resend, putting you on cc. Then we can discuss if/how/when we might want to go that path.
--- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -475,7 +475,9 @@ static int ovl_mmap(struct file *file, s if (WARN_ON(file != vma->vm_file)) return -EIO; - vma_set_file(vma, realfile); + ret = vma_set_file(vma, realfile); + if (ret) + return ret; old_cred = ovl_override_creds(file_inode(file)->i_sb); ret = call_mmap(vma->vm_file, vma); --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2780,7 +2780,7 @@ static inline void vma_set_page_prot(str } #endif -void vma_set_file(struct vm_area_struct *vma, struct file *file); +int /* __must_check */ vma_set_file(struct vm_area_struct *vma, struct file *file); #ifdef CONFIG_NUMA_BALANCING unsigned long change_prot_numa(struct vm_area_struct *vma, --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1806,6 +1806,7 @@ unsigned long mmap_region(struct file *f */ vma->vm_file = get_file(file); error = call_mmap(file, vma); + file = vma->vm_file; if (error) goto unmap_and_free_vma; @@ -1867,7 +1868,6 @@ unsigned long mmap_region(struct file *f if (vm_flags & VM_DENYWRITE) allow_write_access(file); } - file = vma->vm_file; out: perf_event_mmap(vma); --- a/mm/util.c +++ b/mm/util.c @@ -314,12 +314,41 @@ int vma_is_stack_for_current(struct vm_a /* * Change backing file, only valid to use during initial VMA setup. */ -void vma_set_file(struct vm_area_struct *vma, struct file *file) +int vma_set_file(struct vm_area_struct *vma, struct file *file) { + vm_flags_t vm_flags = vma->vm_flags; + int err; + + /* Get temporary denial counts on replacement */ + if (vm_flags & VM_DENYWRITE) { + err = deny_write_access(file); + if (err) + return err; + } + if (vm_flags & VM_SHARED) { + err = mapping_map_writable(file->f_mapping); + if (err) + goto undo_denywrite; + } + /* Changing an anonymous vma with this is illegal */ get_file(file); swap(vma->vm_file, file); + + /* Undo temporary denial counts on replaced */ + if (vm_flags & VM_SHARED) + mapping_unmap_writable(file->f_mapping); + + if (vm_flags & VM_DENYWRITE) + allow_write_access(file); + fput(file); + return 0; + +undo_denywrite: + if (vm_flags & VM_DENYWRITE) + allow_write_access(file); + return err; } EXPORT_SYMBOL(vma_set_file);