Message ID | 1592913499-15558-1-git-send-email-jrdr.linux@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,v2] xen/privcmd: Convert get_user_pages*() to pin_user_pages*() | expand |
On 6/23/20 7:58 AM, Souptick Joarder wrote: > In 2019, we introduced pin_user_pages*() and now we are converting > get_user_pages*() to the new API as appropriate. [1] & [2] could > be referred for more information. This is case 5 as per document [1]. > > As discussed, pages need to be marked as dirty before unpinned it. > > Previously, if lock_pages() end up partially mapping pages, it used > to return -ERRNO due to which unlock_pages() have to go through > each pages[i] till *nr_pages* to validate them. This can be avoided > by passing correct number partially mapped pages & -ERRNO separately > while returning from lock_pages() due to error. > With this fix unlock_pages() doesn't need to validate pages[i] till > *nr_pages* for error scenario. This should be split into two patches please. The first one will fix the return value bug (and will need to go to stable branches) and the second will use new routine to pin pages. > @@ -580,25 +580,30 @@ static long privcmd_ioctl_mmap_batch( > > static int lock_pages( > struct privcmd_dm_op_buf kbufs[], unsigned int num, > - struct page *pages[], unsigned int nr_pages) > + struct page *pages[], unsigned int nr_pages, int *errno) I'd prefer if you used more traditional way of returning error code by the function, and pass the number of pinned pages as an argument. This will also make call site simpler. -boris
On Tue, Jun 23, 2020 at 11:11 PM Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > > On 6/23/20 7:58 AM, Souptick Joarder wrote: > > In 2019, we introduced pin_user_pages*() and now we are converting > > get_user_pages*() to the new API as appropriate. [1] & [2] could > > be referred for more information. This is case 5 as per document [1]. > > > > As discussed, pages need to be marked as dirty before unpinned it. > > > > Previously, if lock_pages() end up partially mapping pages, it used > > to return -ERRNO due to which unlock_pages() have to go through > > each pages[i] till *nr_pages* to validate them. This can be avoided > > by passing correct number partially mapped pages & -ERRNO separately > > while returning from lock_pages() due to error. > > With this fix unlock_pages() doesn't need to validate pages[i] till > > *nr_pages* for error scenario. > > > This should be split into two patches please. The first one will fix the > return value bug (and will need to go to stable branches) and the second > will use new routine to pin pages. Initially I split the patches into 2 commits. But at last moment I figure out that, this bug fix ( better to call coding error, doesn't looks like lead to any runtime bug) is tightly coupled to 2nd commit for pin_user_pages*() conversion, which means we don't need the bug fix patch if we are not converting the API to pin_user_pages*()/ unpin_user_pages_dirty_lock(). That's the reason to clubbed these two commits into a single one. If this looks unreasonable, will split it into 2 patches again. > > > > @@ -580,25 +580,30 @@ static long privcmd_ioctl_mmap_batch( > > > > static int lock_pages( > > struct privcmd_dm_op_buf kbufs[], unsigned int num, > > - struct page *pages[], unsigned int nr_pages) > > + struct page *pages[], unsigned int nr_pages, int *errno) > > > I'd prefer if you used more traditional way of returning error code by > the function, and pass the number of pinned pages as an argument. This > will also make call site simpler. Sure, Will do it. > > > -boris >
On 6/23/20 9:36 PM, Souptick Joarder wrote: > On Tue, Jun 23, 2020 at 11:11 PM Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: >> On 6/23/20 7:58 AM, Souptick Joarder wrote: >>> In 2019, we introduced pin_user_pages*() and now we are converting >>> get_user_pages*() to the new API as appropriate. [1] & [2] could >>> be referred for more information. This is case 5 as per document [1]. >>> >>> As discussed, pages need to be marked as dirty before unpinned it. >>> >>> Previously, if lock_pages() end up partially mapping pages, it used >>> to return -ERRNO due to which unlock_pages() have to go through >>> each pages[i] till *nr_pages* to validate them. This can be avoided >>> by passing correct number partially mapped pages & -ERRNO separately >>> while returning from lock_pages() due to error. >>> With this fix unlock_pages() doesn't need to validate pages[i] till >>> *nr_pages* for error scenario. >> >> This should be split into two patches please. The first one will fix the >> return value bug (and will need to go to stable branches) and the second >> will use new routine to pin pages. > Initially I split the patches into 2 commits. But at last moment I > figure out that, > this bug fix ( better to call coding error, doesn't looks like lead to > any runtime bug) is tightly coupled to 2nd commit for > pin_user_pages*() conversion, > which means we don't need the bug fix patch if we are not converting the API to > pin_user_pages*()/ unpin_user_pages_dirty_lock(). That's the reason to > clubbed these two > commits into a single one. I am not sure I understand why the two issues are connected. Failure of either get_user_pages_fast() or pin_user_pages() will result in the same kind of overall ioctl failure, won't it? One concern though is that we are changing how user will see this error. Currently Xen devicemodel (which AFAIK is the only caller) ignores it, which is I think is wrong. So another option would be to fix this in Xen and continue returning positive number here. I guess it's back to Paul again. -boris
On Wed, Jun 24, 2020 at 9:07 PM Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > > On 6/23/20 9:36 PM, Souptick Joarder wrote: > > On Tue, Jun 23, 2020 at 11:11 PM Boris Ostrovsky > > <boris.ostrovsky@oracle.com> wrote: > >> On 6/23/20 7:58 AM, Souptick Joarder wrote: > >>> In 2019, we introduced pin_user_pages*() and now we are converting > >>> get_user_pages*() to the new API as appropriate. [1] & [2] could > >>> be referred for more information. This is case 5 as per document [1]. > >>> > >>> As discussed, pages need to be marked as dirty before unpinned it. > >>> > >>> Previously, if lock_pages() end up partially mapping pages, it used > >>> to return -ERRNO due to which unlock_pages() have to go through > >>> each pages[i] till *nr_pages* to validate them. This can be avoided > >>> by passing correct number partially mapped pages & -ERRNO separately > >>> while returning from lock_pages() due to error. > >>> With this fix unlock_pages() doesn't need to validate pages[i] till > >>> *nr_pages* for error scenario. > >> > >> This should be split into two patches please. The first one will fix the > >> return value bug (and will need to go to stable branches) and the second > >> will use new routine to pin pages. > > Initially I split the patches into 2 commits. But at last moment I > > figure out that, > > this bug fix ( better to call coding error, doesn't looks like lead to > > any runtime bug) is tightly coupled to 2nd commit for > > pin_user_pages*() conversion, > > which means we don't need the bug fix patch if we are not converting the API to > > pin_user_pages*()/ unpin_user_pages_dirty_lock(). That's the reason to > > clubbed these two > > commits into a single one. > > > I am not sure I understand why the two issues are connected. Failure of > either get_user_pages_fast() or pin_user_pages() will result in the same > kind of overall ioctl failure, won't it? > Sorry, I think, I need to go through the bug again for my clarity. My understanding is -> First, In case of success lock_pages() returns 0, so what privcmd_ioctl_dm_op() will return to user is depend on the return value of HYPERVISOR_dm_op() and at last all the pages are unpinned. At present I am not clear about the return value of HYPERVISOR_dm_op(). But this path of code looks to be correct. second, incase of failure from lock_pages() will return either -ENOSPC / -ERRNO receive from {get|pin|_user_pages_fast() inside for loop. (at that point there might be some partially mapped pages as it is mapping inside the loop). Upon receiving -ERRNO privcmd_ioctl_dm_op() will pass the same -ERRNO to user (not the partial mapped page count). This looks to be correct behaviour from user space. The problem I was trying to address is, in the second scenario when lock_pages() returns -ERRNO and there are already existing mapped pages. In this scenario, when unlcok_pages() is called it will iterate till *nr_pages* to validate and unmap the pages. But it is supposed to unmap only the mapped pages which I am trying to address as part of bug fix. Let me know if I am able to be in sync with what you are expecting ? > One concern though is that we are changing how user will see this error. My understanding from above is user will always see right -ERRNO and will not be impacted. Another scenario I was thinking, if {get|pin|_user_pages_fast() return number of pages pinned which may be fewer than the number requested. Then lock_pages() returns 0 to caller and caller/user space will continue with the assumption that all pages are pinned correctly. Is this an expected behaviour as per current code ? > Currently Xen devicemodel (which AFAIK is the only caller) ignores it, > which is I think is wrong. So another option would be to fix this in Xen > and continue returning positive number here. I guess it's back to Paul > again. > > > -boris > >
On 6/24/20 12:41 PM, Souptick Joarder wrote: > On Wed, Jun 24, 2020 at 9:07 PM Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: >> On 6/23/20 9:36 PM, Souptick Joarder wrote: >>> On Tue, Jun 23, 2020 at 11:11 PM Boris Ostrovsky >>> <boris.ostrovsky@oracle.com> wrote: >>>> On 6/23/20 7:58 AM, Souptick Joarder wrote: >>>>> In 2019, we introduced pin_user_pages*() and now we are converting >>>>> get_user_pages*() to the new API as appropriate. [1] & [2] could >>>>> be referred for more information. This is case 5 as per document [1]. >>>>> >>>>> As discussed, pages need to be marked as dirty before unpinned it. >>>>> >>>>> Previously, if lock_pages() end up partially mapping pages, it used >>>>> to return -ERRNO due to which unlock_pages() have to go through >>>>> each pages[i] till *nr_pages* to validate them. This can be avoided >>>>> by passing correct number partially mapped pages & -ERRNO separately >>>>> while returning from lock_pages() due to error. >>>>> With this fix unlock_pages() doesn't need to validate pages[i] till >>>>> *nr_pages* for error scenario. >>>> This should be split into two patches please. The first one will fix the >>>> return value bug (and will need to go to stable branches) and the second >>>> will use new routine to pin pages. >>> Initially I split the patches into 2 commits. But at last moment I >>> figure out that, >>> this bug fix ( better to call coding error, doesn't looks like lead to >>> any runtime bug) is tightly coupled to 2nd commit for >>> pin_user_pages*() conversion, >>> which means we don't need the bug fix patch if we are not converting the API to >>> pin_user_pages*()/ unpin_user_pages_dirty_lock(). That's the reason to >>> clubbed these two >>> commits into a single one. >> >> I am not sure I understand why the two issues are connected. Failure of >> either get_user_pages_fast() or pin_user_pages() will result in the same >> kind of overall ioctl failure, won't it? >> > Sorry, I think, I need to go through the bug again for my clarity. > > My understanding is -> > > First, In case of success lock_pages() returns 0, so what privcmd_ioctl_dm_op() > will return to user is depend on the return value of HYPERVISOR_dm_op() > and at last all the pages are unpinned. At present I am not clear about the > return value of HYPERVISOR_dm_op(). But this path of code looks to be correct. > > second, incase of failure from lock_pages() will return either -ENOSPC / -ERRNO > receive from {get|pin|_user_pages_fast() inside for loop. (at that > point there might > be some partially mapped pages as it is mapping inside the loop). Upon > receiving > -ERRNO privcmd_ioctl_dm_op() will pass the same -ERRNO to user (not the partial > mapped page count). This looks to be correct behaviour from user space. Sigh... I am sorry, you are absolutely correct. It does return -errno on get_user_pages_fast() failure. I don't know what (or whether) I was thinking. (And so Paul, ignore my question) -boris > > The problem I was trying to address is, in the second scenario when > lock_pages() returns > -ERRNO and there are already existing mapped pages. In this scenario, > when unlcok_pages() > is called it will iterate till *nr_pages* to validate and unmap the > pages. But it is supposed to > unmap only the mapped pages which I am trying to address as part of bug fix. > > Let me know if I am able to be in sync with what you are expecting ? > > >> One concern though is that we are changing how user will see this error. > My understanding from above is user will always see right -ERRNO and > will not be impacted. > > Another scenario I was thinking, if {get|pin|_user_pages_fast() return > number of pages pinned > which may be fewer than the number requested. Then lock_pages() > returns 0 to caller > and caller/user space will continue with the assumption that all pages > are pinned correctly. > Is this an expected behaviour as per current code ? > > >> Currently Xen devicemodel (which AFAIK is the only caller) ignores it, >> which is I think is wrong. So another option would be to fix this in Xen >> and continue returning positive number here. I guess it's back to Paul >> again. >> >> >> -boris >> >>
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index a250d11..013d23b 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -580,25 +580,30 @@ static long privcmd_ioctl_mmap_batch( static int lock_pages( struct privcmd_dm_op_buf kbufs[], unsigned int num, - struct page *pages[], unsigned int nr_pages) + struct page *pages[], unsigned int nr_pages, int *errno) { unsigned int i; + int pinned = 0, rc = 0; for (i = 0; i < num; i++) { unsigned int requested; - int pinned; + rc += pinned; requested = DIV_ROUND_UP( offset_in_page(kbufs[i].uptr) + kbufs[i].size, PAGE_SIZE); - if (requested > nr_pages) - return -ENOSPC; + if (requested > nr_pages) { + *errno = -ENOSPC; + return rc; + } - pinned = get_user_pages_fast( + pinned = pin_user_pages_fast( (unsigned long) kbufs[i].uptr, requested, FOLL_WRITE, pages); - if (pinned < 0) - return pinned; + if (pinned < 0) { + *errno = pinned; + return rc; + } nr_pages -= pinned; pages += pinned; @@ -609,15 +614,10 @@ static int lock_pages( static void unlock_pages(struct page *pages[], unsigned int nr_pages) { - unsigned int i; - if (!pages) return; - for (i = 0; i < nr_pages; i++) { - if (pages[i]) - put_page(pages[i]); - } + unpin_user_pages_dirty_lock(pages, nr_pages, 1); } static long privcmd_ioctl_dm_op(struct file *file, void __user *udata) @@ -630,6 +630,7 @@ static long privcmd_ioctl_dm_op(struct file *file, void __user *udata) struct xen_dm_op_buf *xbufs = NULL; unsigned int i; long rc; + int errno = 0; if (copy_from_user(&kdata, udata, sizeof(kdata))) return -EFAULT; @@ -683,9 +684,12 @@ static long privcmd_ioctl_dm_op(struct file *file, void __user *udata) goto out; } - rc = lock_pages(kbufs, kdata.num, pages, nr_pages); - if (rc) + rc = lock_pages(kbufs, kdata.num, pages, nr_pages, &errno); + if (errno < 0) { + nr_pages = rc; + rc = errno; goto out; + } for (i = 0; i < kdata.num; i++) { set_xen_guest_handle(xbufs[i].h, kbufs[i].uptr);
In 2019, we introduced pin_user_pages*() and now we are converting get_user_pages*() to the new API as appropriate. [1] & [2] could be referred for more information. This is case 5 as per document [1]. As discussed, pages need to be marked as dirty before unpinned it. Previously, if lock_pages() end up partially mapping pages, it used to return -ERRNO due to which unlock_pages() have to go through each pages[i] till *nr_pages* to validate them. This can be avoided by passing correct number partially mapped pages & -ERRNO separately while returning from lock_pages() due to error. With this fix unlock_pages() doesn't need to validate pages[i] till *nr_pages* for error scenario. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- drivers/xen/privcmd.c | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-)