Message ID | 1380724087-13927-24-git-send-email-jack@suse.cz (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Thanks!! I would like to test these two patches and also do the stable work for the deadlock as well. Do you have these patches in a repo somewhere to save me a bit of work? We had been working on an internal version of the deadlock portion of this patch that uses get_user_pages_fast() vs. the new get_user_pages_unlocked(). The risk of GUP fast is the loss of the "force" arg on GUP fast, which I don't see as significant give our use case. Some nits on the subject and commit message: 1. use IB/qib, IB/ipath vs. ib 2. use the correct ipath vs. qib in the commit message text Mike > -----Original Message----- > From: Jan Kara [mailto:jack@suse.cz] > Sent: Wednesday, October 02, 2013 10:28 AM > To: LKML > Cc: linux-mm@kvack.org; Jan Kara; infinipath; Roland Dreier; linux- > rdma@vger.kernel.org > Subject: [PATCH 23/26] ib: Convert qib_get_user_pages() to > get_user_pages_unlocked() > > Convert qib_get_user_pages() to use get_user_pages_unlocked(). This > shortens the section where we hold mmap_sem for writing and also removes > the knowledge about get_user_pages() locking from ipath driver. We also fix > a bug in testing pinned number of pages when changing the code. > > CC: Mike Marciniszyn <infinipath@intel.com> > CC: Roland Dreier <roland@kernel.org> > CC: linux-rdma@vger.kernel.org > Signed-off-by: Jan Kara <jack@suse.cz> > --- > drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++---------------- > - > 1 file changed, 26 insertions(+), 36 deletions(-) > > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c > b/drivers/infiniband/hw/qib/qib_user_pages.c > index 2bc1d2b96298..57ce83c2d1d9 100644 > --- a/drivers/infiniband/hw/qib/qib_user_pages.c > +++ b/drivers/infiniband/hw/qib/qib_user_pages.c > @@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page > **p, size_t num_pages, > } > } > > -/* > - * Call with current->mm->mmap_sem held. > +/** > + * qib_get_user_pages - lock user pages into memory > + * @start_page: the start page > + * @num_pages: the number of pages > + * @p: the output page structures > + * > + * This function takes a given start page (page aligned user virtual > + * address) and pins it and the following specified number of pages. > +For > + * now, num_pages is always 1, but that will probably change at some > +point > + * (because caller is doing expected sends on a single virtually > +contiguous > + * buffer, so we can do all pages at once). > */ > -static int __qib_get_user_pages(unsigned long start_page, size_t > num_pages, > - struct page **p, struct vm_area_struct > **vma) > +int qib_get_user_pages(unsigned long start_page, size_t num_pages, > + struct page **p) > { > unsigned long lock_limit; > size_t got; > int ret; > + struct mm_struct *mm = current->mm; > > + down_write(&mm->mmap_sem); > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > > - if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { > + if (mm->pinned_vm + num_pages > lock_limit && > !capable(CAP_IPC_LOCK)) { > + up_write(&mm->mmap_sem); > ret = -ENOMEM; > goto bail; > } > + mm->pinned_vm += num_pages; > + up_write(&mm->mmap_sem); > > for (got = 0; got < num_pages; got += ret) { > - ret = get_user_pages(current, current->mm, > - start_page + got * PAGE_SIZE, > - num_pages - got, 1, 1, > - p + got, vma); > + ret = get_user_pages_unlocked(current, mm, > + start_page + got * PAGE_SIZE, > + num_pages - got, 1, 1, > + p + got); > if (ret < 0) > goto bail_release; > } > > - current->mm->pinned_vm += num_pages; > > ret = 0; > goto bail; > > bail_release: > __qib_release_user_pages(p, got, 0); > + down_write(&mm->mmap_sem); > + mm->pinned_vm -= num_pages; > + up_write(&mm->mmap_sem); > bail: > return ret; > } > @@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, > struct page *page, > return phys; > } > > -/** > - * qib_get_user_pages - lock user pages into memory > - * @start_page: the start page > - * @num_pages: the number of pages > - * @p: the output page structures > - * > - * This function takes a given start page (page aligned user virtual > - * address) and pins it and the following specified number of pages. For > - * now, num_pages is always 1, but that will probably change at some point > - * (because caller is doing expected sends on a single virtually contiguous > - * buffer, so we can do all pages at once). > - */ > -int qib_get_user_pages(unsigned long start_page, size_t num_pages, > - struct page **p) > -{ > - int ret; > - > - down_write(¤t->mm->mmap_sem); > - > - ret = __qib_get_user_pages(start_page, num_pages, p, NULL); > - > - up_write(¤t->mm->mmap_sem); > - > - return ret; > -} > - > void qib_release_user_pages(struct page **p, size_t num_pages) { > if (current->mm) /* during close after signal, mm can be NULL */ > -- > 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 02-10-13 14:54:59, Marciniszyn, Mike wrote: > Thanks!! > > I would like to test these two patches and also do the stable work for > the deadlock as well. Do you have these patches in a repo somewhere to > save me a bit of work? I've pushed the patches to: git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git into the branch 'get_user_pages'. > We had been working on an internal version of the deadlock portion of > this patch that uses get_user_pages_fast() vs. the new > get_user_pages_unlocked(). > > The risk of GUP fast is the loss of the "force" arg on GUP fast, which I > don't see as significant give our use case. Yes. I was discussing with Roland some time ago whether the force argument is needed and he said it is. So I kept the arguments of get_user_pages() intact and just simplified the locking... BTW: Infiniband still needs mmap_sem for modification of mm->pinned_vm. It might be worthwhile to actually change that to atomic_long_t (only kernel/events/core.c would need update besides infiniband) and avoid taking mmap_sem in infiniband code altogether. > Some nits on the subject and commit message: > 1. use IB/qib, IB/ipath vs. ib > 2. use the correct ipath vs. qib in the commit message text Sure will do. Thanks the quick reply and for comments. Honza > > -----Original Message----- > > From: Jan Kara [mailto:jack@suse.cz] > > Sent: Wednesday, October 02, 2013 10:28 AM > > To: LKML > > Cc: linux-mm@kvack.org; Jan Kara; infinipath; Roland Dreier; linux- > > rdma@vger.kernel.org > > Subject: [PATCH 23/26] ib: Convert qib_get_user_pages() to > > get_user_pages_unlocked() > > > > Convert qib_get_user_pages() to use get_user_pages_unlocked(). This > > shortens the section where we hold mmap_sem for writing and also removes > > the knowledge about get_user_pages() locking from ipath driver. We also fix > > a bug in testing pinned number of pages when changing the code. > > > > CC: Mike Marciniszyn <infinipath@intel.com> > > CC: Roland Dreier <roland@kernel.org> > > CC: linux-rdma@vger.kernel.org > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++---------------- > > - > > 1 file changed, 26 insertions(+), 36 deletions(-) > > > > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c > > b/drivers/infiniband/hw/qib/qib_user_pages.c > > index 2bc1d2b96298..57ce83c2d1d9 100644 > > --- a/drivers/infiniband/hw/qib/qib_user_pages.c > > +++ b/drivers/infiniband/hw/qib/qib_user_pages.c > > @@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page > > **p, size_t num_pages, > > } > > } > > > > -/* > > - * Call with current->mm->mmap_sem held. > > +/** > > + * qib_get_user_pages - lock user pages into memory > > + * @start_page: the start page > > + * @num_pages: the number of pages > > + * @p: the output page structures > > + * > > + * This function takes a given start page (page aligned user virtual > > + * address) and pins it and the following specified number of pages. > > +For > > + * now, num_pages is always 1, but that will probably change at some > > +point > > + * (because caller is doing expected sends on a single virtually > > +contiguous > > + * buffer, so we can do all pages at once). > > */ > > -static int __qib_get_user_pages(unsigned long start_page, size_t > > num_pages, > > - struct page **p, struct vm_area_struct > > **vma) > > +int qib_get_user_pages(unsigned long start_page, size_t num_pages, > > + struct page **p) > > { > > unsigned long lock_limit; > > size_t got; > > int ret; > > + struct mm_struct *mm = current->mm; > > > > + down_write(&mm->mmap_sem); > > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > > > > - if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { > > + if (mm->pinned_vm + num_pages > lock_limit && > > !capable(CAP_IPC_LOCK)) { > > + up_write(&mm->mmap_sem); > > ret = -ENOMEM; > > goto bail; > > } > > + mm->pinned_vm += num_pages; > > + up_write(&mm->mmap_sem); > > > > for (got = 0; got < num_pages; got += ret) { > > - ret = get_user_pages(current, current->mm, > > - start_page + got * PAGE_SIZE, > > - num_pages - got, 1, 1, > > - p + got, vma); > > + ret = get_user_pages_unlocked(current, mm, > > + start_page + got * PAGE_SIZE, > > + num_pages - got, 1, 1, > > + p + got); > > if (ret < 0) > > goto bail_release; > > } > > > > - current->mm->pinned_vm += num_pages; > > > > ret = 0; > > goto bail; > > > > bail_release: > > __qib_release_user_pages(p, got, 0); > > + down_write(&mm->mmap_sem); > > + mm->pinned_vm -= num_pages; > > + up_write(&mm->mmap_sem); > > bail: > > return ret; > > } > > @@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, > > struct page *page, > > return phys; > > } > > > > -/** > > - * qib_get_user_pages - lock user pages into memory > > - * @start_page: the start page > > - * @num_pages: the number of pages > > - * @p: the output page structures > > - * > > - * This function takes a given start page (page aligned user virtual > > - * address) and pins it and the following specified number of pages. For > > - * now, num_pages is always 1, but that will probably change at some point > > - * (because caller is doing expected sends on a single virtually contiguous > > - * buffer, so we can do all pages at once). > > - */ > > -int qib_get_user_pages(unsigned long start_page, size_t num_pages, > > - struct page **p) > > -{ > > - int ret; > > - > > - down_write(¤t->mm->mmap_sem); > > - > > - ret = __qib_get_user_pages(start_page, num_pages, p, NULL); > > - > > - up_write(¤t->mm->mmap_sem); > > - > > - return ret; > > -} > > - > > void qib_release_user_pages(struct page **p, size_t num_pages) { > > if (current->mm) /* during close after signal, mm can be NULL */ > > -- > > 1.8.1.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
> > The risk of GUP fast is the loss of the "force" arg on GUP fast, which > > I don't see as significant give our use case. > Yes. I was discussing with Roland some time ago whether the force > argument is needed and he said it is. So I kept the arguments of > get_user_pages() intact and just simplified the locking... The PSM side of the code is a more traditional use of GUP (like direct I/O), so I think it is a different use case than the locking for IB memory regions. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote: > > > The risk of GUP fast is the loss of the "force" arg on GUP fast, which > > > I don't see as significant give our use case. > > Yes. I was discussing with Roland some time ago whether the force > > argument is needed and he said it is. So I kept the arguments of > > get_user_pages() intact and just simplified the locking... > > The PSM side of the code is a more traditional use of GUP (like direct > I/O), so I think it is a different use case than the locking for IB > memory regions. Ah, I see. Whatever suits you best. I don't really care as long as get_user_pages() locking doesn't leak into IB drivers :) Honza
> -----Original Message----- > From: Jan Kara [mailto:jack@suse.cz] > Sent: Wednesday, October 02, 2013 11:39 AM > To: Marciniszyn, Mike > Cc: Jan Kara; LKML; linux-mm@kvack.org; infinipath; Roland Dreier; linux- > rdma@vger.kernel.org > Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to > get_user_pages_unlocked() > > On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote: > > > > The risk of GUP fast is the loss of the "force" arg on GUP fast, > > > > which I don't see as significant give our use case. > > > Yes. I was discussing with Roland some time ago whether the force > > > argument is needed and he said it is. So I kept the arguments of > > > get_user_pages() intact and just simplified the locking... > > > > The PSM side of the code is a more traditional use of GUP (like direct > > I/O), so I think it is a different use case than the locking for IB > > memory regions. > Ah, I see. Whatever suits you best. I don't really care as long as > get_user_pages() locking doesn't leak into IB drivers :) > > Honza > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> The PSM side of the code is a more traditional use of GUP (like direct I/O), so > I think it is a different use case than the locking for IB memory regions. I have resubmitted the two deadlock fixes using get_user_pages_fast() and marked them stable. See http://marc.info/?l=linux-rdma&m=138089335506355&w=2 Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Inadvertent send! Mike > -----Original Message----- > From: Marciniszyn, Mike > Sent: Friday, October 04, 2013 9:39 AM > To: Jan Kara > Cc: LKML; linux-mm@kvack.org; infinipath; Roland Dreier; linux- > rdma@vger.kernel.org > Subject: RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to > get_user_pages_unlocked() > > > > > -----Original Message----- > > From: Jan Kara [mailto:jack@suse.cz] > > Sent: Wednesday, October 02, 2013 11:39 AM > > To: Marciniszyn, Mike > > Cc: Jan Kara; LKML; linux-mm@kvack.org; infinipath; Roland Dreier; > > linux- rdma@vger.kernel.org > > Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to > > get_user_pages_unlocked() > > > > On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote: > > > > > The risk of GUP fast is the loss of the "force" arg on GUP fast, > > > > > which I don't see as significant give our use case. > > > > Yes. I was discussing with Roland some time ago whether the > > > > force argument is needed and he said it is. So I kept the > > > > arguments of > > > > get_user_pages() intact and just simplified the locking... > > > > > > The PSM side of the code is a more traditional use of GUP (like > > > direct I/O), so I think it is a different use case than the locking > > > for IB memory regions. > > Ah, I see. Whatever suits you best. I don't really care as long as > > get_user_pages() locking doesn't leak into IB drivers :) > > > > Honza > > -- > > Jan Kara <jack@suse.cz> > > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Convert qib_get_user_pages() to use get_user_pages_unlocked(). This > shortens the section where we hold mmap_sem for writing and also removes > the knowledge about get_user_pages() locking from ipath driver. We also fix > a bug in testing pinned number of pages when changing the code. > This patch and the sibling ipath patch will nominally take the mmap_sem twice where the old routine only took it once. This is a performance issue. Is the intent here to deprecate get_user_pages()? I agree, the old code's lock limit test is broke and needs to be fixed. I like the elimination of the silly wrapper routine! Could the lock limit test be pushed into another version of the wrapper so that there is only one set of mmap_sem transactions? Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri 04-10-13 13:52:49, Marciniszyn, Mike wrote: > > Convert qib_get_user_pages() to use get_user_pages_unlocked(). This > > shortens the section where we hold mmap_sem for writing and also removes > > the knowledge about get_user_pages() locking from ipath driver. We also fix > > a bug in testing pinned number of pages when changing the code. > > > > This patch and the sibling ipath patch will nominally take the mmap_sem > twice where the old routine only took it once. This is a performance > issue. It will take mmap_sem only once during normal operation. Only if get_user_pages_unlocked() fail, we have to take mmap_sem again to undo the change of mm->pinned_vm. > Is the intent here to deprecate get_user_pages()? Well, as much as I'd like to, there are really places in mm code which need to call get_user_pages() while holding mmap_sem to be able to inspect corresponding vmas etc. So I want to reduce get_user_pages() use as much as possible but I'm not really hoping in completely removing it. > I agree, the old code's lock limit test is broke and needs to be fixed. > I like the elimination of the silly wrapper routine! > > Could the lock limit test be pushed into another version of the wrapper > so that there is only one set of mmap_sem transactions? I'm sorry, I don't understand what you mean here... Honza
> -----Original Message----- > From: Jan Kara [mailto:jack@suse.cz] > Sent: Friday, October 04, 2013 2:33 PM > To: Marciniszyn, Mike > Cc: Jan Kara; LKML; linux-mm@kvack.org; infinipath; Roland Dreier; linux- > rdma@vger.kernel.org > Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to > get_user_pages_unlocked() > > On Fri 04-10-13 13:52:49, Marciniszyn, Mike wrote: > > > Convert qib_get_user_pages() to use get_user_pages_unlocked(). This > > > shortens the section where we hold mmap_sem for writing and also > > > removes the knowledge about get_user_pages() locking from ipath > > > driver. We also fix a bug in testing pinned number of pages when > changing the code. > > > > > > > This patch and the sibling ipath patch will nominally take the mmap_sem > > twice where the old routine only took it once. This is a performance > > issue. > It will take mmap_sem only once during normal operation. Only if > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo > the change of mm->pinned_vm. > > > Is the intent here to deprecate get_user_pages()? > Well, as much as I'd like to, there are really places in mm code which need > to call get_user_pages() while holding mmap_sem to be able to inspect > corresponding vmas etc. So I want to reduce get_user_pages() use as much > as possible but I'm not really hoping in completely removing it. > > > I agree, the old code's lock limit test is broke and needs to be fixed. > > I like the elimination of the silly wrapper routine! > > > > Could the lock limit test be pushed into another version of the > > wrapper so that there is only one set of mmap_sem transactions? > I'm sorry, I don't understand what you mean here... > > Honza > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > This patch and the sibling ipath patch will nominally take the mmap_sem > > twice where the old routine only took it once. This is a performance > > issue. > It will take mmap_sem only once during normal operation. Only if > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo > the change of mm->pinned_vm. > > > Is the intent here to deprecate get_user_pages()? The old code looked like: __qib_get_user_pages() (broken) ulimit test for (...) get_user_pages() qib_get_user_pages() mmap_sem lock __qib_get_user_pages() mmap_sem() unlock The new code is: get_user_pages_unlocked() mmap_sem lock get_user_pages() mmap_sem unlock qib_get_user_pages() mmap_sem lock ulimit test and locked pages maintenance mmap_sem unlock for (...) get_user_pages_unlocked() I count an additional pair of mmap_sem transactions in the normal case. > > > Could the lock limit test be pushed into another version of the > > wrapper so that there is only one set of mmap_sem transactions? > I'm sorry, I don't understand what you mean here... > This is what I had in mind: get_user_pages_ulimit_unlocked() mmap_sem lock ulimit test and locked pages maintenance (from qib/ipath) for (...) get_user_pages_unlocked() mmap_sem unlock qib_get_user_pages() get_user_pages_ulimit_unlocked() This really pushes the code into a new wrapper common to ipath/qib and any others that might want to combine locking with ulimit enforcement. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon 07-10-13 15:38:24, Marciniszyn, Mike wrote: > > > This patch and the sibling ipath patch will nominally take the mmap_sem > > > twice where the old routine only took it once. This is a performance > > > issue. > > It will take mmap_sem only once during normal operation. Only if > > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo > > the change of mm->pinned_vm. > > > > > Is the intent here to deprecate get_user_pages()? > > The old code looked like: > __qib_get_user_pages() > (broken) ulimit test > for (...) > get_user_pages() > > qib_get_user_pages() > mmap_sem lock > __qib_get_user_pages() > mmap_sem() unlock > > The new code is: > > get_user_pages_unlocked() > mmap_sem lock > get_user_pages() > mmap_sem unlock > > qib_get_user_pages() > mmap_sem lock > ulimit test and locked pages maintenance > mmap_sem unlock > for (...) > get_user_pages_unlocked() > > I count an additional pair of mmap_sem transactions in the normal case. Ah, sorry, you are right. > > > Could the lock limit test be pushed into another version of the > > > wrapper so that there is only one set of mmap_sem transactions? > > I'm sorry, I don't understand what you mean here... > > > > This is what I had in mind: > > get_user_pages_ulimit_unlocked() > mmap_sem lock > ulimit test and locked pages maintenance (from qib/ipath) > for (...) > get_user_pages_unlocked() > mmap_sem unlock > > qib_get_user_pages() > get_user_pages_ulimit_unlocked() > > This really pushes the code into a new wrapper common to ipath/qib and > any others that might want to combine locking with ulimit enforcement. We could do that but frankly, I'd rather change ulimit enforcement to not require mmap_sem and use atomic counter instead. I'll see what I can do. Honza
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 2bc1d2b96298..57ce83c2d1d9 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages, } } -/* - * Call with current->mm->mmap_sem held. +/** + * qib_get_user_pages - lock user pages into memory + * @start_page: the start page + * @num_pages: the number of pages + * @p: the output page structures + * + * This function takes a given start page (page aligned user virtual + * address) and pins it and the following specified number of pages. For + * now, num_pages is always 1, but that will probably change at some point + * (because caller is doing expected sends on a single virtually contiguous + * buffer, so we can do all pages at once). */ -static int __qib_get_user_pages(unsigned long start_page, size_t num_pages, - struct page **p, struct vm_area_struct **vma) +int qib_get_user_pages(unsigned long start_page, size_t num_pages, + struct page **p) { unsigned long lock_limit; size_t got; int ret; + struct mm_struct *mm = current->mm; + down_write(&mm->mmap_sem); lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { + if (mm->pinned_vm + num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { + up_write(&mm->mmap_sem); ret = -ENOMEM; goto bail; } + mm->pinned_vm += num_pages; + up_write(&mm->mmap_sem); for (got = 0; got < num_pages; got += ret) { - ret = get_user_pages(current, current->mm, - start_page + got * PAGE_SIZE, - num_pages - got, 1, 1, - p + got, vma); + ret = get_user_pages_unlocked(current, mm, + start_page + got * PAGE_SIZE, + num_pages - got, 1, 1, + p + got); if (ret < 0) goto bail_release; } - current->mm->pinned_vm += num_pages; ret = 0; goto bail; bail_release: __qib_release_user_pages(p, got, 0); + down_write(&mm->mmap_sem); + mm->pinned_vm -= num_pages; + up_write(&mm->mmap_sem); bail: return ret; } @@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page, return phys; } -/** - * qib_get_user_pages - lock user pages into memory - * @start_page: the start page - * @num_pages: the number of pages - * @p: the output page structures - * - * This function takes a given start page (page aligned user virtual - * address) and pins it and the following specified number of pages. For - * now, num_pages is always 1, but that will probably change at some point - * (because caller is doing expected sends on a single virtually contiguous - * buffer, so we can do all pages at once). - */ -int qib_get_user_pages(unsigned long start_page, size_t num_pages, - struct page **p) -{ - int ret; - - down_write(¤t->mm->mmap_sem); - - ret = __qib_get_user_pages(start_page, num_pages, p, NULL); - - up_write(¤t->mm->mmap_sem); - - return ret; -} - void qib_release_user_pages(struct page **p, size_t num_pages) { if (current->mm) /* during close after signal, mm can be NULL */
Convert qib_get_user_pages() to use get_user_pages_unlocked(). This shortens the section where we hold mmap_sem for writing and also removes the knowledge about get_user_pages() locking from ipath driver. We also fix a bug in testing pinned number of pages when changing the code. CC: Mike Marciniszyn <infinipath@intel.com> CC: Roland Dreier <roland@kernel.org> CC: linux-rdma@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> --- drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++----------------- 1 file changed, 26 insertions(+), 36 deletions(-)