diff mbox series

[RFC] xen/privcmd: Convert get_user_pages*() to pin_user_pages*()

Message ID 1592363698-4266-1-git-send-email-jrdr.linux@gmail.com (mailing list archive)
State Superseded
Headers show
Series [RFC] xen/privcmd: Convert get_user_pages*() to pin_user_pages*() | expand

Commit Message

Souptick Joarder June 17, 2020, 3:14 a.m. UTC
In 2019, we introduced pin_user_pages*() and now we are converting
get_user_pages*() to the new API as appropriate. [1] & [2] could
be referred for more information.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
        https://lwn.net/Articles/807108/

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
Hi,

I have compile tested this patch but unable to run-time test,
so any testing help is much appriciated.

Also have a question, why the existing code is not marking the
pages dirty (since it did FOLL_WRITE) ?

 drivers/xen/privcmd.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Boris Ostrovsky June 17, 2020, 5:57 p.m. UTC | #1
On 6/16/20 11:14 PM, Souptick Joarder wrote:
> In 2019, we introduced pin_user_pages*() and now we are converting
> get_user_pages*() to the new API as appropriate. [1] & [2] could
> be referred for more information.
>
> [1] Documentation/core-api/pin_user_pages.rst
>
> [2] "Explicit pinning of user-space pages":
>         https://lwn.net/Articles/807108/
>
> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
> Hi,
>
> I have compile tested this patch but unable to run-time test,
> so any testing help is much appriciated.
>
> Also have a question, why the existing code is not marking the
> pages dirty (since it did FOLL_WRITE) ?


Indeed, seems to me it should. Paul?


>
>  drivers/xen/privcmd.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index a250d11..543739e 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -594,7 +594,7 @@ static int lock_pages(
>  		if (requested > nr_pages)
>  			return -ENOSPC;
>  
> -		pinned = get_user_pages_fast(
> +		pinned = pin_user_pages_fast(
>  			(unsigned long) kbufs[i].uptr,
>  			requested, FOLL_WRITE, pages);
>  		if (pinned < 0)
> @@ -614,10 +614,7 @@ static void unlock_pages(struct page *pages[], unsigned int nr_pages)
>  	if (!pages)
>  		return;
>  
> -	for (i = 0; i < nr_pages; i++) {
> -		if (pages[i])
> -			put_page(pages[i]);
> -	}
> +	unpin_user_pages(pages, nr_pages);


Why are you no longer checking for valid pages?


-boris
Souptick Joarder June 19, 2020, 3:12 a.m. UTC | #2
On Wed, Jun 17, 2020 at 11:29 PM Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>
> On 6/16/20 11:14 PM, Souptick Joarder wrote:
> > In 2019, we introduced pin_user_pages*() and now we are converting
> > get_user_pages*() to the new API as appropriate. [1] & [2] could
> > be referred for more information.
> >
> > [1] Documentation/core-api/pin_user_pages.rst
> >
> > [2] "Explicit pinning of user-space pages":
> >         https://lwn.net/Articles/807108/
> >
> > Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > ---
> > Hi,
> >
> > I have compile tested this patch but unable to run-time test,
> > so any testing help is much appriciated.
> >
> > Also have a question, why the existing code is not marking the
> > pages dirty (since it did FOLL_WRITE) ?
>
>
> Indeed, seems to me it should. Paul?
>
>
> >
> >  drivers/xen/privcmd.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> > index a250d11..543739e 100644
> > --- a/drivers/xen/privcmd.c
> > +++ b/drivers/xen/privcmd.c
> > @@ -594,7 +594,7 @@ static int lock_pages(
> >               if (requested > nr_pages)
> >                       return -ENOSPC;
> >
> > -             pinned = get_user_pages_fast(
> > +             pinned = pin_user_pages_fast(
> >                       (unsigned long) kbufs[i].uptr,
> >                       requested, FOLL_WRITE, pages);
> >               if (pinned < 0)
> > @@ -614,10 +614,7 @@ static void unlock_pages(struct page *pages[], unsigned int nr_pages)
> >       if (!pages)
> >               return;
> >
> > -     for (i = 0; i < nr_pages; i++) {
> > -             if (pages[i])
> > -                     put_page(pages[i]);
> > -     }
> > +     unpin_user_pages(pages, nr_pages);
>
>
> Why are you no longer checking for valid pages?

My understanding is, in case of lock_pages() end up returning partial
mapped pages,
we should pass no. of partial mapped pages to unlock_pages(), not nr_pages.
This will avoid checking extra check to validate the pages[i].

and if lock_pages() returns 0 in success, anyway we have all the pages[i] valid.
I will try to correct it in v2.

But I agree, there is no harm to check for pages[i] and I believe,
unpin_user_pages()
is the right place to do so.

John any thought ?
John Hubbard June 19, 2020, 7:30 a.m. UTC | #3
On 2020-06-18 20:12, Souptick Joarder wrote:
> On Wed, Jun 17, 2020 at 11:29 PM Boris Ostrovsky
> <boris.ostrovsky@oracle.com> wrote:
>>
>> On 6/16/20 11:14 PM, Souptick Joarder wrote:
>>> In 2019, we introduced pin_user_pages*() and now we are converting
>>> get_user_pages*() to the new API as appropriate. [1] & [2] could
>>> be referred for more information.


Ideally, the commit description should say which case, in
pin_user_pages.rst, that this is.


>>>
>>> [1] Documentation/core-api/pin_user_pages.rst
>>>
>>> [2] "Explicit pinning of user-space pages":
>>>          https://lwn.net/Articles/807108/
>>>
>>> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
>>> Cc: John Hubbard <jhubbard@nvidia.com>
>>> ---
>>> Hi,
>>>
>>> I have compile tested this patch but unable to run-time test,
>>> so any testing help is much appriciated.
>>>
>>> Also have a question, why the existing code is not marking the
>>> pages dirty (since it did FOLL_WRITE) ?
>>
>>
>> Indeed, seems to me it should. Paul?

Definitely good to get an answer from an expert in this code, but
meanwhile, it's reasonable to just mark them dirty. Below...

>>
>>
>>>
>>>   drivers/xen/privcmd.c | 7 ++-----
>>>   1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>>> index a250d11..543739e 100644
>>> --- a/drivers/xen/privcmd.c
>>> +++ b/drivers/xen/privcmd.c
>>> @@ -594,7 +594,7 @@ static int lock_pages(
>>>                if (requested > nr_pages)
>>>                        return -ENOSPC;
>>>
>>> -             pinned = get_user_pages_fast(
>>> +             pinned = pin_user_pages_fast(
>>>                        (unsigned long) kbufs[i].uptr,
>>>                        requested, FOLL_WRITE, pages);
>>>                if (pinned < 0)
>>> @@ -614,10 +614,7 @@ static void unlock_pages(struct page *pages[], unsigned int nr_pages)
>>>        if (!pages)
>>>                return;
>>>
>>> -     for (i = 0; i < nr_pages; i++) {
>>> -             if (pages[i])
>>> -                     put_page(pages[i]);
>>> -     }
>>> +     unpin_user_pages(pages, nr_pages);


...so just use unpin_user_pages_dirty_lock() here, I think.


>>
>>
>> Why are you no longer checking for valid pages?
> 
> My understanding is, in case of lock_pages() end up returning partial
> mapped pages,
> we should pass no. of partial mapped pages to unlock_pages(), not nr_pages.
> This will avoid checking extra check to validate the pages[i].
> 
> and if lock_pages() returns 0 in success, anyway we have all the pages[i] valid.
> I will try to correct it in v2.
> 
> But I agree, there is no harm to check for pages[i] and I believe,


Generally, it *is* harmful to do unnecessary checks, in most code, but especially
in most kernel code. If you can convince yourself that the check for null pages
is redundant here, then please let's remove that check. The code becomes then
becomes shorter, simpler, and faster.


> unpin_user_pages()
> is the right place to do so.
> 
> John any thought ?


So far I haven't seen any cases to justify changing the implementation of
unpin_user_pages().


thanks,
Paul Durrant June 19, 2020, 9:03 a.m. UTC | #4
> -----Original Message-----
> From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Sent: 17 June 2020 18:57
> To: Souptick Joarder <jrdr.linux@gmail.com>; jgross@suse.com; sstabellini@kernel.org
> Cc: xen-devel@lists.xenproject.org; linux-kernel@vger.kernel.org; John Hubbard <jhubbard@nvidia.com>;
> paul@xen.org
> Subject: Re: [RFC PATCH] xen/privcmd: Convert get_user_pages*() to pin_user_pages*()
> 
> On 6/16/20 11:14 PM, Souptick Joarder wrote:
> > In 2019, we introduced pin_user_pages*() and now we are converting
> > get_user_pages*() to the new API as appropriate. [1] & [2] could
> > be referred for more information.
> >
> > [1] Documentation/core-api/pin_user_pages.rst
> >
> > [2] "Explicit pinning of user-space pages":
> >         https://lwn.net/Articles/807108/
> >
> > Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > ---
> > Hi,
> >
> > I have compile tested this patch but unable to run-time test,
> > so any testing help is much appriciated.
> >
> > Also have a question, why the existing code is not marking the
> > pages dirty (since it did FOLL_WRITE) ?
> 
> 
> Indeed, seems to me it should. Paul?
> 

Yes, it looks like that was an oversight. The hypercall may well result in data being copied back into the buffers so the whole pages array should be considered dirty.

  Paul

> 
> >
> >  drivers/xen/privcmd.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> > index a250d11..543739e 100644
> > --- a/drivers/xen/privcmd.c
> > +++ b/drivers/xen/privcmd.c
> > @@ -594,7 +594,7 @@ static int lock_pages(
> >  		if (requested > nr_pages)
> >  			return -ENOSPC;
> >
> > -		pinned = get_user_pages_fast(
> > +		pinned = pin_user_pages_fast(
> >  			(unsigned long) kbufs[i].uptr,
> >  			requested, FOLL_WRITE, pages);
> >  		if (pinned < 0)
> > @@ -614,10 +614,7 @@ static void unlock_pages(struct page *pages[], unsigned int nr_pages)
> >  	if (!pages)
> >  		return;
> >
> > -	for (i = 0; i < nr_pages; i++) {
> > -		if (pages[i])
> > -			put_page(pages[i]);
> > -	}
> > +	unpin_user_pages(pages, nr_pages);
> 
> 
> Why are you no longer checking for valid pages?
> 
> 
> -boris
> 
> 
>
Souptick Joarder June 22, 2020, 6:52 p.m. UTC | #5
On Fri, Jun 19, 2020 at 1:00 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 2020-06-18 20:12, Souptick Joarder wrote:
> > On Wed, Jun 17, 2020 at 11:29 PM Boris Ostrovsky
> > <boris.ostrovsky@oracle.com> wrote:
> >>
> >> On 6/16/20 11:14 PM, Souptick Joarder wrote:
> >>> In 2019, we introduced pin_user_pages*() and now we are converting
> >>> get_user_pages*() to the new API as appropriate. [1] & [2] could
> >>> be referred for more information.
>
>
> Ideally, the commit description should say which case, in
> pin_user_pages.rst, that this is.
>

Ok.

>
> >>>
> >>> [1] Documentation/core-api/pin_user_pages.rst
> >>>
> >>> [2] "Explicit pinning of user-space pages":
> >>>          https://lwn.net/Articles/807108/
> >>>
> >>> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
> >>> Cc: John Hubbard <jhubbard@nvidia.com>
> >>> ---
> >>> Hi,
> >>>
> >>> I have compile tested this patch but unable to run-time test,
> >>> so any testing help is much appriciated.
> >>>
> >>> Also have a question, why the existing code is not marking the
> >>> pages dirty (since it did FOLL_WRITE) ?
> >>
> >>
> >> Indeed, seems to me it should. Paul?
>
> Definitely good to get an answer from an expert in this code, but
> meanwhile, it's reasonable to just mark them dirty. Below...
>
> >>
> >>
> >>>
> >>>   drivers/xen/privcmd.c | 7 ++-----
> >>>   1 file changed, 2 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> >>> index a250d11..543739e 100644
> >>> --- a/drivers/xen/privcmd.c
> >>> +++ b/drivers/xen/privcmd.c
> >>> @@ -594,7 +594,7 @@ static int lock_pages(
> >>>                if (requested > nr_pages)
> >>>                        return -ENOSPC;
> >>>
> >>> -             pinned = get_user_pages_fast(
> >>> +             pinned = pin_user_pages_fast(
> >>>                        (unsigned long) kbufs[i].uptr,
> >>>                        requested, FOLL_WRITE, pages);
> >>>                if (pinned < 0)
> >>> @@ -614,10 +614,7 @@ static void unlock_pages(struct page *pages[], unsigned int nr_pages)
> >>>        if (!pages)
> >>>                return;
> >>>
> >>> -     for (i = 0; i < nr_pages; i++) {
> >>> -             if (pages[i])
> >>> -                     put_page(pages[i]);
> >>> -     }
> >>> +     unpin_user_pages(pages, nr_pages);
>
>
> ...so just use unpin_user_pages_dirty_lock() here, I think.
>
>
> >>
> >>
> >> Why are you no longer checking for valid pages?
> >
> > My understanding is, in case of lock_pages() end up returning partial
> > mapped pages,
> > we should pass no. of partial mapped pages to unlock_pages(), not nr_pages.
> > This will avoid checking extra check to validate the pages[i].
> >
> > and if lock_pages() returns 0 in success, anyway we have all the pages[i] valid.
> > I will try to correct it in v2.
> >
> > But I agree, there is no harm to check for pages[i] and I believe,
>
>
> Generally, it *is* harmful to do unnecessary checks, in most code, but especially
> in most kernel code. If you can convince yourself that the check for null pages
> is redundant here, then please let's remove that check. The code becomes then
> becomes shorter, simpler, and faster.

I read the code again. I think, this check is needed to handle a scenario when
lock_pages() return -ENOSPC. Better to keep this check. Let me post v2 of this
RFC for a clear view.

>
>
> > unpin_user_pages()
> > is the right place to do so.
> >
> > John any thought ?
>
>
> So far I haven't seen any cases to justify changing the implementation of
> unpin_user_pages().
>
>
> thanks,
> --
> John Hubbard
> NVIDIA
Boris Ostrovsky June 22, 2020, 7:10 p.m. UTC | #6
On 6/22/20 2:52 PM, Souptick Joarder wrote:
>
> I read the code again. I think, this check is needed to handle a scenario when
> lock_pages() return -ENOSPC. Better to keep this check. Let me post v2 of this
> RFC for a clear view.


Actually, error handling seems to be somewhat broken here. If
lock_pages() returns number of pinned pages then that's what we end up
returning from privcmd_ioctl_dm_op(), all the way to user ioctl(). Which
I don't think is right, we should return proper (negative) error.


Do you mind fixing that we well? Then you should be able to avoid
testing pages in a loop.


-boris
Boris Ostrovsky June 22, 2020, 7:25 p.m. UTC | #7
On 6/22/20 3:28 PM, Souptick Joarder wrote:
> On Tue, Jun 23, 2020 at 12:40 AM Boris Ostrovsky
> <boris.ostrovsky@oracle.com> wrote:
>> On 6/22/20 2:52 PM, Souptick Joarder wrote:
>>> I read the code again. I think, this check is needed to handle a scenario when
>>> lock_pages() return -ENOSPC. Better to keep this check. Let me post v2 of this
>>> RFC for a clear view.
>>
>> Actually, error handling seems to be somewhat broken here. If
>> lock_pages() returns number of pinned pages then that's what we end up
>> returning from privcmd_ioctl_dm_op(), all the way to user ioctl(). Which
>> I don't think is right, we should return proper (negative) error.
>>
> What -ERRNO is more appropriate here ? -ENOSPC ?


You can simply pass along error code that get_user_pages_fast() returned.


-boris
Souptick Joarder June 22, 2020, 7:28 p.m. UTC | #8
On Tue, Jun 23, 2020 at 12:40 AM Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>
> On 6/22/20 2:52 PM, Souptick Joarder wrote:
> >
> > I read the code again. I think, this check is needed to handle a scenario when
> > lock_pages() return -ENOSPC. Better to keep this check. Let me post v2 of this
> > RFC for a clear view.
>
>
> Actually, error handling seems to be somewhat broken here. If
> lock_pages() returns number of pinned pages then that's what we end up
> returning from privcmd_ioctl_dm_op(), all the way to user ioctl(). Which
> I don't think is right, we should return proper (negative) error.
>

What -ERRNO is more appropriate here ? -ENOSPC ?

>
> Do you mind fixing that we well? Then you should be able to avoid
> testing pages in a loop.

Ok, let me try to fix it.
>
>
> -boris
>
diff mbox series

Patch

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index a250d11..543739e 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -594,7 +594,7 @@  static int lock_pages(
 		if (requested > nr_pages)
 			return -ENOSPC;
 
-		pinned = get_user_pages_fast(
+		pinned = pin_user_pages_fast(
 			(unsigned long) kbufs[i].uptr,
 			requested, FOLL_WRITE, pages);
 		if (pinned < 0)
@@ -614,10 +614,7 @@  static void unlock_pages(struct page *pages[], unsigned int nr_pages)
 	if (!pages)
 		return;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (pages[i])
-			put_page(pages[i]);
-	}
+	unpin_user_pages(pages, nr_pages);
 }
 
 static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)