Message ID | 20201007164426.1812530-6-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | follow_pfn and other iomap races | expand |
On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote: > > - /* > - * While get_vaddr_frames() could be used for transient (kernel > - * controlled lifetime) pinning of memory pages all current > - * users establish long term (userspace controlled lifetime) > - * page pinning. Treat get_vaddr_frames() like > - * get_user_pages_longterm() and disallow it for filesystem-dax > - * mappings. > - */ > - if (vma_is_fsdax(vma)) { > - ret = -EOPNOTSUPP; > - goto out; > - } > - > - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { > - vec->got_ref = true; > - vec->is_pfns = false; > - ret = pin_user_pages_locked(start, nr_frames, > - gup_flags, (struct page **)(vec->ptrs), &locked); > - goto out; > - } The vm_flags still need to be checked before going into the while loop. If the break is taken then nothing would check vm_flags Jason
On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote: > > > > - /* > > - * While get_vaddr_frames() could be used for transient (kernel > > - * controlled lifetime) pinning of memory pages all current > > - * users establish long term (userspace controlled lifetime) > > - * page pinning. Treat get_vaddr_frames() like > > - * get_user_pages_longterm() and disallow it for filesystem-dax > > - * mappings. > > - */ > > - if (vma_is_fsdax(vma)) { > > - ret = -EOPNOTSUPP; > > - goto out; > > - } > > - > > - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { > > - vec->got_ref = true; > > - vec->is_pfns = false; > > - ret = pin_user_pages_locked(start, nr_frames, > > - gup_flags, (struct page **)(vec->ptrs), &locked); > > - goto out; > > - } > > The vm_flags still need to be checked before going into the while > loop. If the break is taken then nothing would check vm_flags Hm right that's a bin inconsistent. follow_pfn also checks for this, so I think we can just ditch this entirely both here and in the do {} while () check, simplifying the latter to just while (vma). Well, just make it a real loop with less confusing control flow probably. Or prefer I keep this and touch the code less? -Daniel
On Wed, Oct 07, 2020 at 07:12:24PM +0200, Daniel Vetter wrote: > On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote: > > > > > > - /* > > > - * While get_vaddr_frames() could be used for transient (kernel > > > - * controlled lifetime) pinning of memory pages all current > > > - * users establish long term (userspace controlled lifetime) > > > - * page pinning. Treat get_vaddr_frames() like > > > - * get_user_pages_longterm() and disallow it for filesystem-dax > > > - * mappings. > > > - */ > > > - if (vma_is_fsdax(vma)) { > > > - ret = -EOPNOTSUPP; > > > - goto out; > > > - } > > > - > > > - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { > > > - vec->got_ref = true; > > > - vec->is_pfns = false; > > > - ret = pin_user_pages_locked(start, nr_frames, > > > - gup_flags, (struct page **)(vec->ptrs), &locked); > > > - goto out; > > > - } > > > > The vm_flags still need to be checked before going into the while > > loop. If the break is taken then nothing would check vm_flags > > Hm right that's a bin inconsistent. follow_pfn also checks for this, > so I think we can just ditch this entirely both here and in the do {} > while () check, simplifying the latter to just while (vma). Well, just > make it a real loop with less confusing control flow probably. It does read very poorly with the redundant check, espeically since I keep forgetting follow_pfn does it too :\ Jason
On 10/7/20 9:44 AM, Daniel Vetter wrote: > This is used by media/videbuf2 for persistent dma mappings, not just > for a single dma operation and then freed again, so needs > FOLL_LONGTERM. > > Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to > locking issues. Rework the code to pull the pup path out from the > mmap_sem critical section as suggested by Jason. > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Pawel Osciak <pawel@osciak.com> > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > Cc: Kyungmin Park <kyungmin.park@samsung.com> > Cc: Tomasz Figa <tfiga@chromium.org> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > --- > mm/frame_vector.c | 36 +++++++++++------------------------- > 1 file changed, 11 insertions(+), 25 deletions(-) > > diff --git a/mm/frame_vector.c b/mm/frame_vector.c > index 10f82d5643b6..39db520a51dc 100644 > --- a/mm/frame_vector.c > +++ b/mm/frame_vector.c > @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, > struct vm_area_struct *vma; > int ret = 0; > int err; > - int locked; > > if (nr_frames == 0) > return 0; > @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, > > start = untagged_addr(start); > > + ret = pin_user_pages_fast(start, nr_frames, > + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, > + (struct page **)(vec->ptrs)); > + if (ret > 0) { > + vec->got_ref = true; > + vec->is_pfns = false; > + goto out_unlocked; > + } This part looks good, and changing to _fast is a potential performance improvement, too. > + > mmap_read_lock(mm); > - locked = 1; > vma = find_vma_intersection(mm, start, start + 1); > if (!vma) { > ret = -EFAULT; > goto out; > } > > - /* > - * While get_vaddr_frames() could be used for transient (kernel > - * controlled lifetime) pinning of memory pages all current > - * users establish long term (userspace controlled lifetime) > - * page pinning. Treat get_vaddr_frames() like > - * get_user_pages_longterm() and disallow it for filesystem-dax > - * mappings. > - */ > - if (vma_is_fsdax(vma)) { > - ret = -EOPNOTSUPP; > - goto out; > - } Are you sure we don't need to check vma_is_fsdax() anymore? > - > - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { > - vec->got_ref = true; > - vec->is_pfns = false; > - ret = pin_user_pages_locked(start, nr_frames, > - gup_flags, (struct page **)(vec->ptrs), &locked); > - goto out; > - } > - > vec->got_ref = false; > vec->is_pfns = true; > do { > @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, > vma = find_vma_intersection(mm, start, start + 1); > } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP)); > out: > - if (locked) > - mmap_read_unlock(mm); > + mmap_read_unlock(mm); > +out_unlocked: > if (!ret) > ret = -EFAULT; > if (ret > 0) > All of the error handling still looks accurate there. thanks,
On Wed, Oct 7, 2020 at 11:13 PM John Hubbard <jhubbard@nvidia.com> wrote: > > On 10/7/20 9:44 AM, Daniel Vetter wrote: > > This is used by media/videbuf2 for persistent dma mappings, not just > > for a single dma operation and then freed again, so needs > > FOLL_LONGTERM. > > > > Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to > > locking issues. Rework the code to pull the pup path out from the > > mmap_sem critical section as suggested by Jason. > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > Cc: Pawel Osciak <pawel@osciak.com> > > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > > Cc: Kyungmin Park <kyungmin.park@samsung.com> > > Cc: Tomasz Figa <tfiga@chromium.org> > > Cc: Mauro Carvalho Chehab <mchehab@kernel.org> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: John Hubbard <jhubbard@nvidia.com> > > Cc: Jérôme Glisse <jglisse@redhat.com> > > Cc: Jan Kara <jack@suse.cz> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: linux-mm@kvack.org > > Cc: linux-arm-kernel@lists.infradead.org > > Cc: linux-samsung-soc@vger.kernel.org > > Cc: linux-media@vger.kernel.org > > --- > > mm/frame_vector.c | 36 +++++++++++------------------------- > > 1 file changed, 11 insertions(+), 25 deletions(-) > > > > diff --git a/mm/frame_vector.c b/mm/frame_vector.c > > index 10f82d5643b6..39db520a51dc 100644 > > --- a/mm/frame_vector.c > > +++ b/mm/frame_vector.c > > @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, > > struct vm_area_struct *vma; > > int ret = 0; > > int err; > > - int locked; > > > > if (nr_frames == 0) > > return 0; > > @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, > > > > start = untagged_addr(start); > > > > + ret = pin_user_pages_fast(start, nr_frames, > > + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, > > + (struct page **)(vec->ptrs)); > > + if (ret > 0) { > > + vec->got_ref = true; > > + vec->is_pfns = false; > > + goto out_unlocked; > > + } > > This part looks good, and changing to _fast is a potential performance improvement, > too. > > > + > > mmap_read_lock(mm); > > - locked = 1; > > vma = find_vma_intersection(mm, start, start + 1); > > if (!vma) { > > ret = -EFAULT; > > goto out; > > } > > > > - /* > > - * While get_vaddr_frames() could be used for transient (kernel > > - * controlled lifetime) pinning of memory pages all current > > - * users establish long term (userspace controlled lifetime) > > - * page pinning. Treat get_vaddr_frames() like > > - * get_user_pages_longterm() and disallow it for filesystem-dax > > - * mappings. > > - */ > > - if (vma_is_fsdax(vma)) { > > - ret = -EOPNOTSUPP; > > - goto out; > > - } > > Are you sure we don't need to check vma_is_fsdax() anymore? Since FOLL_LONGTERM checks for this and can only return struct page backed memory, and explicitly excludes VM_IO | VM_PFNMAP, was assuming this is not needed for follow_pfn. And the get_user_pages_locked this used back then didn't have the same check, hence why it was added (and FOLL_LONGTERM still doesn't work for the _locked versions, as you pointed out on the last round of this discussion). But now that you're asking, I have no idea whether fsdax vma can also be of VM_IO | VM_PFNMAP type. I'm not seeing that set anywhere in fs/dax.c, but that says nothing :-) Dan, you added this check originally, do we need it for VM_SPECIAL vmas too? Thanks, Daniel > > > - > > - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { > > - vec->got_ref = true; > > - vec->is_pfns = false; > > - ret = pin_user_pages_locked(start, nr_frames, > > - gup_flags, (struct page **)(vec->ptrs), &locked); > > - goto out; > > - } > > - > > vec->got_ref = false; > > vec->is_pfns = true; > > do { > > @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, > > vma = find_vma_intersection(mm, start, start + 1); > > } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP)); > > out: > > - if (locked) > > - mmap_read_unlock(mm); > > + mmap_read_unlock(mm); > > +out_unlocked: > > if (!ret) > > ret = -EFAULT; > > if (ret > 0) > > > > All of the error handling still looks accurate there. > > thanks, > -- > John Hubbard > NVIDIA
diff --git a/mm/frame_vector.c b/mm/frame_vector.c index 10f82d5643b6..39db520a51dc 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, struct vm_area_struct *vma; int ret = 0; int err; - int locked; if (nr_frames == 0) return 0; @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, start = untagged_addr(start); + ret = pin_user_pages_fast(start, nr_frames, + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, + (struct page **)(vec->ptrs)); + if (ret > 0) { + vec->got_ref = true; + vec->is_pfns = false; + goto out_unlocked; + } + mmap_read_lock(mm); - locked = 1; vma = find_vma_intersection(mm, start, start + 1); if (!vma) { ret = -EFAULT; goto out; } - /* - * While get_vaddr_frames() could be used for transient (kernel - * controlled lifetime) pinning of memory pages all current - * users establish long term (userspace controlled lifetime) - * page pinning. Treat get_vaddr_frames() like - * get_user_pages_longterm() and disallow it for filesystem-dax - * mappings. - */ - if (vma_is_fsdax(vma)) { - ret = -EOPNOTSUPP; - goto out; - } - - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { - vec->got_ref = true; - vec->is_pfns = false; - ret = pin_user_pages_locked(start, nr_frames, - gup_flags, (struct page **)(vec->ptrs), &locked); - goto out; - } - vec->got_ref = false; vec->is_pfns = true; do { @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, vma = find_vma_intersection(mm, start, start + 1); } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP)); out: - if (locked) - mmap_read_unlock(mm); + mmap_read_unlock(mm); +out_unlocked: if (!ret) ret = -EFAULT; if (ret > 0)
This is used by media/videbuf2 for persistent dma mappings, not just for a single dma operation and then freed again, so needs FOLL_LONGTERM. Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to locking issues. Rework the code to pull the pup path out from the mmap_sem critical section as suggested by Jason. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Pawel Osciak <pawel@osciak.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org --- mm/frame_vector.c | 36 +++++++++++------------------------- 1 file changed, 11 insertions(+), 25 deletions(-)