Message ID | 20190115181300.27547-1-dave@stgolabs.net (mailing list archive) |
---|---|
Headers | show |
Series | mm: make pinned_vm atomic and simplify users | expand |
Also Ccing lkml, sorry. On Tue, 15 Jan 2019, Davidlohr Bueso wrote: >Hi, > >The following patches aim to provide cleanups to users that pin pages >(mostly infiniband) by converting the counter to atomic -- note that >Daniel Jordan also has patches[1] for the locked_vm counterpart and vfio. > >Apart from removing a source of mmap_sem writer, we benefit in that >we can get rid of a lot of code that defers work when the lock cannot >be acquired, as well as drivers avoiding mmap_sem altogether by also >converting gup to gup_fast() and letting the mm handle it. Users >that do the gup_longterm() remain of course under at least reader mmap_sem. > >Everything has been compile-tested _only_ so I hope I didn't do anything >too stupid. Please consider for v5.1. > >On a similar topic and potential follow up, it would be nice to resurrect >Peter's VM_PINNED idea in that the broken semantics that occurred after >bc3e53f682 ("mm: distinguish between mlocked and pinned pages") are still >present. Also encapsulating internal mm logic via mm[un]pin() instead of >drivers having to know about internals and playing nice with compaction are >all wins. > >Thanks! > >[1] https://lkml.org/lkml/2018/11/5/854 > >Davidlohr Bueso (6): > mm: make mm->pinned_vm an atomic counter > mic/scif: do not use mmap_sem > drivers/IB,qib: do not use mmap_sem > drivers/IB,hfi1: do not se mmap_sem > drivers/IB,usnic: reduce scope of mmap_sem > drivers/IB,core: reduce scope of mmap_sem > > drivers/infiniband/core/umem.c | 47 +++----------------- > drivers/infiniband/hw/hfi1/user_pages.c | 12 ++--- > drivers/infiniband/hw/qib/qib_user_pages.c | 69 ++++++++++------------------- > drivers/infiniband/hw/usnic/usnic_ib_main.c | 2 - > drivers/infiniband/hw/usnic/usnic_uiom.c | 56 +++-------------------- > drivers/infiniband/hw/usnic/usnic_uiom.h | 1 - > drivers/misc/mic/scif/scif_rma.c | 38 +++++----------- > fs/proc/task_mmu.c | 2 +- > include/linux/mm_types.h | 2 +- > kernel/events/core.c | 8 ++-- > kernel/fork.c | 2 +- > mm/debug.c | 3 +- > 12 files changed, 57 insertions(+), 185 deletions(-) > >-- >2.16.4 >
On Tue, Jan 15, 2019 at 10:12:56AM -0800, Davidlohr Bueso wrote: > The driver uses mmap_sem for both pinned_vm accounting and > get_user_pages(). By using gup_fast() and letting the mm handle > the lock if needed, we can no longer rely on the semaphore and > simplify the whole thing. > > Cc: sudeep.dutt@intel.com > Cc: ashutosh.dixit@intel.com > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> > --- > drivers/misc/mic/scif/scif_rma.c | 36 +++++++++++------------------------- > 1 file changed, 11 insertions(+), 25 deletions(-) > > diff --git a/drivers/misc/mic/scif/scif_rma.c b/drivers/misc/mic/scif/scif_rma.c > index a92b4d6f099c..445529ce2ad7 100644 > --- a/drivers/misc/mic/scif/scif_rma.c > +++ b/drivers/misc/mic/scif/scif_rma.c > @@ -272,21 +272,12 @@ static inline void __scif_release_mm(struct mm_struct *mm) > > static inline int > __scif_dec_pinned_vm_lock(struct mm_struct *mm, > - int nr_pages, bool try_lock) > + int nr_pages) > { > if (!mm || !nr_pages || !scif_ulimit_check) > return 0; > - if (try_lock) { > - if (!down_write_trylock(&mm->mmap_sem)) { > - dev_err(scif_info.mdev.this_device, > - "%s %d err\n", __func__, __LINE__); > - return -1; > - } > - } else { > - down_write(&mm->mmap_sem); > - } > + > atomic_long_sub(nr_pages, &mm->pinned_vm); > - up_write(&mm->mmap_sem); > return 0; > } > > @@ -298,16 +289,16 @@ static inline int __scif_check_inc_pinned_vm(struct mm_struct *mm, > if (!mm || !nr_pages || !scif_ulimit_check) > return 0; > > - locked = nr_pages; > - locked += atomic_long_read(&mm->pinned_vm); > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > + locked = atomic_long_add_return(nr_pages, &mm->pinned_vm); > + > if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) { > + atomic_long_sub(nr_pages, &mm->pinned_vm); > dev_err(scif_info.mdev.this_device, > "locked(%lu) > lock_limit(%lu)\n", > locked, lock_limit); > return -ENOMEM; > } > - atomic_long_set(&mm->pinned_vm, locked); > return 0; > } > > @@ -326,7 +317,7 @@ int scif_destroy_window(struct scif_endpt *ep, struct scif_window *window) > > might_sleep(); > if (!window->temp && window->mm) { > - __scif_dec_pinned_vm_lock(window->mm, window->nr_pages, 0); > + __scif_dec_pinned_vm_lock(window->mm, window->nr_pages); > __scif_release_mm(window->mm); > window->mm = NULL; > } > @@ -737,7 +728,7 @@ int scif_unregister_window(struct scif_window *window) > ep->rma_info.dma_chan); > } else { > if (!__scif_dec_pinned_vm_lock(window->mm, > - window->nr_pages, 1)) { > + window->nr_pages)) { > __scif_release_mm(window->mm); > window->mm = NULL; > } > @@ -1385,28 +1376,23 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot, > prot |= SCIF_PROT_WRITE; > retry: > mm = current->mm; > - down_write(&mm->mmap_sem); > if (ulimit) { > err = __scif_check_inc_pinned_vm(mm, nr_pages); > if (err) { > - up_write(&mm->mmap_sem); > pinned_pages->nr_pages = 0; > goto error_unmap; > } > } > > - pinned_pages->nr_pages = get_user_pages( > + pinned_pages->nr_pages = get_user_pages_fast( > (u64)addr, > nr_pages, > (prot & SCIF_PROT_WRITE) ? FOLL_WRITE : 0, > - pinned_pages->pages, > - NULL); > - up_write(&mm->mmap_sem); > + pinned_pages->pages); > if (nr_pages != pinned_pages->nr_pages) { > if (try_upgrade) { > if (ulimit) > - __scif_dec_pinned_vm_lock(mm, > - nr_pages, 0); > + __scif_dec_pinned_vm_lock(mm, nr_pages); > /* Roll back any pinned pages */ > for (i = 0; i < pinned_pages->nr_pages; i++) { > if (pinned_pages->pages[i]) > @@ -1433,7 +1419,7 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot, > return err; > dec_pinned: > if (ulimit) > - __scif_dec_pinned_vm_lock(mm, nr_pages, 0); > + __scif_dec_pinned_vm_lock(mm, nr_pages); > /* Something went wrong! Rollback */ > error_unmap: > pinned_pages->nr_pages = nr_pages; > -- > 2.16.4 >