diff mbox series

userfaultfd: don't fail on unrecognized features

Message ID 20220722201513.1624158-1-axelrasmussen@google.com (mailing list archive)
State New, archived
Headers show
Series userfaultfd: don't fail on unrecognized features | expand

Commit Message

Axel Rasmussen July 22, 2022, 8:15 p.m. UTC
The basic interaction for setting up a userfaultfd is, userspace issues
a UFFDIO_API ioctl, and passes in a set of zero or more feature flags,
indicating the features they would prefer to use.

Of course, different kernels may support different sets of features
(depending on kernel version, kconfig options, architecture, etc).
Userspace's expectations may also not match: perhaps it was built
against newer kernel headers, which defined some features the kernel
it's running on doesn't support.

Currently, if userspace passes in a flag we don't recognize, the
initialization fails and we return -EINVAL. This isn't great, though.
Userspace doesn't have an obvious way to react to this; sure, one of the
features I asked for was unavailable, but which one? The only option it
has is to turn off things "at random" and hope something works.

Instead, modify UFFDIO_API to just ignore any unrecognized feature
flags. The interaction is now that the initialization will succeed, and
as always we return the *subset* of feature flags that can actually be
used back to userspace.

Now userspace has an obvious way to react: it checks if any flags it
asked for are missing. If so, it can conclude this kernel doesn't
support those, and it can either resign itself to not using them, or
fail with an error on its own, or whatever else.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 fs/userfaultfd.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Peter Xu March 27, 2023, 9:01 p.m. UTC | #1
I think I overlooked this patch..

Axel, could you explain why this patch is correct?  Comments inline.

On Fri, Jul 22, 2022 at 01:15:13PM -0700, Axel Rasmussen wrote:
> The basic interaction for setting up a userfaultfd is, userspace issues
> a UFFDIO_API ioctl, and passes in a set of zero or more feature flags,
> indicating the features they would prefer to use.
> 
> Of course, different kernels may support different sets of features
> (depending on kernel version, kconfig options, architecture, etc).
> Userspace's expectations may also not match: perhaps it was built
> against newer kernel headers, which defined some features the kernel
> it's running on doesn't support.
> 
> Currently, if userspace passes in a flag we don't recognize, the
> initialization fails and we return -EINVAL. This isn't great, though.

Why?  IIUC that's the major way for user app to detect any misconfig of
feature list so it can bail out early.

Quoting from man page (ioctl_userfaultfd(2)):

UFFDIO_API
       (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API handshake.

       ...

           struct uffdio_api {
               __u64 api;        /* Requested API version (input) */
               __u64 features;   /* Requested features (input/output) */
               __u64 ioctls;     /* Available ioctl() operations (output) */
           };

       ...

       For Linux kernel versions before 4.11, the features field must be
       initialized to zero before the call to UFFDIO_API, and zero (i.e.,
       no feature bits) is placed in the features field by the kernel upon
       return from ioctl(2).

       ...

       To enable userfaultfd features the application should set a bit
       corresponding to each feature it wants to enable in the features
       field.  If the kernel supports all the requested features it will
       enable them.  Otherwise it will zero out the returned uffdio_api
       structure and return EINVAL.

IIUC the right way to use this API is first probe with features==0, then
the kernel will return all the supported features, then the user app should
enable only a subset (or all, but not a superset) of supported ones in the
next UFFDIO_API with a new uffd.

> Userspace doesn't have an obvious way to react to this; sure, one of the
> features I asked for was unavailable, but which one? The only option it
> has is to turn off things "at random" and hope something works.
> 
> Instead, modify UFFDIO_API to just ignore any unrecognized feature
> flags. The interaction is now that the initialization will succeed, and
> as always we return the *subset* of feature flags that can actually be
> used back to userspace.
> 
> Now userspace has an obvious way to react: it checks if any flags it
> asked for are missing. If so, it can conclude this kernel doesn't
> support those, and it can either resign itself to not using them, or
> fail with an error on its own, or whatever else.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>  fs/userfaultfd.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index e943370107d0..4974da1f620c 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1923,10 +1923,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
>  	ret = -EFAULT;
>  	if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
>  		goto out;
> -	features = uffdio_api.features;
> -	ret = -EINVAL;
> -	if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
> -		goto err_out;

What's worse is that I think you removed the only UFFD_API check.  Although
I'm not sure whether it'll be extended in the future or not at all (very
possible we keep using 0xaa forever..), but removing this means we won't be
able to extend it to a new api version in the future, and misconfig of
uffdio_api will wrongly succeed I think:

	/* Test wrong UFFD_API */
	uffdio_api.api = 0xab;
	uffdio_api.features = 0;
	if (ioctl(uffd, UFFDIO_API, &uffdio_api) == 0)
		err("UFFDIO_API should fail but didn't");

> +	/* Ignore unsupported features (userspace built against newer kernel) */
> +	features = uffdio_api.features & UFFD_API_FEATURES;
>  	ret = -EPERM;
>  	if ((features & UFFD_FEATURE_EVENT_FORK) && !capable(CAP_SYS_PTRACE))
>  		goto err_out;
> -- 
> 2.37.1.359.gd136c6c3e2-goog
>
Axel Rasmussen March 28, 2023, 7:28 p.m. UTC | #2
On Mon, Mar 27, 2023 at 2:01 PM Peter Xu <peterx@redhat.com> wrote:
>
> I think I overlooked this patch..
>
> Axel, could you explain why this patch is correct?  Comments inline.
>
> On Fri, Jul 22, 2022 at 01:15:13PM -0700, Axel Rasmussen wrote:
> > The basic interaction for setting up a userfaultfd is, userspace issues
> > a UFFDIO_API ioctl, and passes in a set of zero or more feature flags,
> > indicating the features they would prefer to use.
> >
> > Of course, different kernels may support different sets of features
> > (depending on kernel version, kconfig options, architecture, etc).
> > Userspace's expectations may also not match: perhaps it was built
> > against newer kernel headers, which defined some features the kernel
> > it's running on doesn't support.
> >
> > Currently, if userspace passes in a flag we don't recognize, the
> > initialization fails and we return -EINVAL. This isn't great, though.
>
> Why?  IIUC that's the major way for user app to detect any misconfig of
> feature list so it can bail out early.
>
> Quoting from man page (ioctl_userfaultfd(2)):
>
> UFFDIO_API
>        (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API handshake.
>
>        ...
>
>            struct uffdio_api {
>                __u64 api;        /* Requested API version (input) */
>                __u64 features;   /* Requested features (input/output) */
>                __u64 ioctls;     /* Available ioctl() operations (output) */
>            };
>
>        ...
>
>        For Linux kernel versions before 4.11, the features field must be
>        initialized to zero before the call to UFFDIO_API, and zero (i.e.,
>        no feature bits) is placed in the features field by the kernel upon
>        return from ioctl(2).
>
>        ...
>
>        To enable userfaultfd features the application should set a bit
>        corresponding to each feature it wants to enable in the features
>        field.  If the kernel supports all the requested features it will
>        enable them.  Otherwise it will zero out the returned uffdio_api
>        structure and return EINVAL.
>
> IIUC the right way to use this API is first probe with features==0, then
> the kernel will return all the supported features, then the user app should
> enable only a subset (or all, but not a superset) of supported ones in the
> next UFFDIO_API with a new uffd.

Hmm, I think doing a two-step handshake just overcomplicates things.

Isn't it simpler to just have userspace ask for the features it wants
up front, and then the kernel responds with the subset of features it
actually supports? In the common case (all features were supported),
there is nothing more to do. Userspace is free to detect the uncommon
case where some features it asked for are missing, and handle that
however it likes.

I think this patch is backwards compatible with the two-step approach, too.

I do agree the man page could use some work. I don't think it
describes the two-step handshake process correctly, either. It just
says, "ask for the features you want, and the kernel will either give
them to you or fail". If we really did want to keep the two-step
process, it should describe it (set features == 0 first, then ask only
for the ones you want which are supported), and the example program
should demonstrate it.

But, I think it's simpler to just have the kernel do what the man page
describes. Userspace asks for the features up front, kernel responds
with the subset that are actually supported. No need to return EINVAL
if unsupported features were requested.

>
> > Userspace doesn't have an obvious way to react to this; sure, one of the
> > features I asked for was unavailable, but which one? The only option it
> > has is to turn off things "at random" and hope something works.
> >
> > Instead, modify UFFDIO_API to just ignore any unrecognized feature
> > flags. The interaction is now that the initialization will succeed, and
> > as always we return the *subset* of feature flags that can actually be
> > used back to userspace.
> >
> > Now userspace has an obvious way to react: it checks if any flags it
> > asked for are missing. If so, it can conclude this kernel doesn't
> > support those, and it can either resign itself to not using them, or
> > fail with an error on its own, or whatever else.
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > ---
> >  fs/userfaultfd.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index e943370107d0..4974da1f620c 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -1923,10 +1923,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
> >       ret = -EFAULT;
> >       if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
> >               goto out;
> > -     features = uffdio_api.features;
> > -     ret = -EINVAL;
> > -     if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
> > -             goto err_out;
>
> What's worse is that I think you removed the only UFFD_API check.  Although
> I'm not sure whether it'll be extended in the future or not at all (very
> possible we keep using 0xaa forever..), but removing this means we won't be
> able to extend it to a new api version in the future, and misconfig of
> uffdio_api will wrongly succeed I think:
>
>         /* Test wrong UFFD_API */
>         uffdio_api.api = 0xab;
>         uffdio_api.features = 0;
>         if (ioctl(uffd, UFFDIO_API, &uffdio_api) == 0)
>                 err("UFFDIO_API should fail but didn't");

Agreed, we should add back the UFFD_API check - I am happy to send a
patch for this.

>
> > +     /* Ignore unsupported features (userspace built against newer kernel) */
> > +     features = uffdio_api.features & UFFD_API_FEATURES;
> >       ret = -EPERM;
> >       if ((features & UFFD_FEATURE_EVENT_FORK) && !capable(CAP_SYS_PTRACE))
> >               goto err_out;
> > --
> > 2.37.1.359.gd136c6c3e2-goog
> >
>
> --
> Peter Xu
>
Peter Xu March 28, 2023, 7:45 p.m. UTC | #3
On Tue, Mar 28, 2023 at 12:28:59PM -0700, Axel Rasmussen wrote:
> On Mon, Mar 27, 2023 at 2:01 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > I think I overlooked this patch..
> >
> > Axel, could you explain why this patch is correct?  Comments inline.
> >
> > On Fri, Jul 22, 2022 at 01:15:13PM -0700, Axel Rasmussen wrote:
> > > The basic interaction for setting up a userfaultfd is, userspace issues
> > > a UFFDIO_API ioctl, and passes in a set of zero or more feature flags,
> > > indicating the features they would prefer to use.
> > >
> > > Of course, different kernels may support different sets of features
> > > (depending on kernel version, kconfig options, architecture, etc).
> > > Userspace's expectations may also not match: perhaps it was built
> > > against newer kernel headers, which defined some features the kernel
> > > it's running on doesn't support.
> > >
> > > Currently, if userspace passes in a flag we don't recognize, the
> > > initialization fails and we return -EINVAL. This isn't great, though.
> >
> > Why?  IIUC that's the major way for user app to detect any misconfig of
> > feature list so it can bail out early.
> >
> > Quoting from man page (ioctl_userfaultfd(2)):
> >
> > UFFDIO_API
> >        (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API handshake.
> >
> >        ...
> >
> >            struct uffdio_api {
> >                __u64 api;        /* Requested API version (input) */
> >                __u64 features;   /* Requested features (input/output) */
> >                __u64 ioctls;     /* Available ioctl() operations (output) */
> >            };
> >
> >        ...
> >
> >        For Linux kernel versions before 4.11, the features field must be
> >        initialized to zero before the call to UFFDIO_API, and zero (i.e.,
> >        no feature bits) is placed in the features field by the kernel upon
> >        return from ioctl(2).
> >
> >        ...
> >
> >        To enable userfaultfd features the application should set a bit
> >        corresponding to each feature it wants to enable in the features
> >        field.  If the kernel supports all the requested features it will
> >        enable them.  Otherwise it will zero out the returned uffdio_api
> >        structure and return EINVAL.
> >
> > IIUC the right way to use this API is first probe with features==0, then
> > the kernel will return all the supported features, then the user app should
> > enable only a subset (or all, but not a superset) of supported ones in the
> > next UFFDIO_API with a new uffd.
> 
> Hmm, I think doing a two-step handshake just overcomplicates things.
> 
> Isn't it simpler to just have userspace ask for the features it wants
> up front, and then the kernel responds with the subset of features it
> actually supports? In the common case (all features were supported),
> there is nothing more to do. Userspace is free to detect the uncommon
> case where some features it asked for are missing, and handle that
> however it likes.
> 
> I think this patch is backwards compatible with the two-step approach, too.
> 
> I do agree the man page could use some work. I don't think it
> describes the two-step handshake process correctly, either. It just
> says, "ask for the features you want, and the kernel will either give
> them to you or fail". If we really did want to keep the two-step
> process, it should describe it (set features == 0 first, then ask only
> for the ones you want which are supported), and the example program
> should demonstrate it.
> 
> But, I think it's simpler to just have the kernel do what the man page
> describes. Userspace asks for the features up front, kernel responds
> with the subset that are actually supported. No need to return EINVAL
> if unsupported features were requested.

The uffdio_api.features passed into the ioctl(UFFDIO_API) should be such
request to enable features specified in the kernel.  If the kernel doesn't
support any of the features in the list, IMHO it's very natural to fail it
as described in the man page.  That's also most of the kernel apis do
afaik, by failing any enablement of features if not supported.

> 
> >
> > > Userspace doesn't have an obvious way to react to this; sure, one of the
> > > features I asked for was unavailable, but which one? The only option it
> > > has is to turn off things "at random" and hope something works.
> > >
> > > Instead, modify UFFDIO_API to just ignore any unrecognized feature
> > > flags. The interaction is now that the initialization will succeed, and
> > > as always we return the *subset* of feature flags that can actually be
> > > used back to userspace.
> > >
> > > Now userspace has an obvious way to react: it checks if any flags it
> > > asked for are missing. If so, it can conclude this kernel doesn't
> > > support those, and it can either resign itself to not using them, or
> > > fail with an error on its own, or whatever else.
> > >
> > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > > ---
> > >  fs/userfaultfd.c | 6 ++----
> > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index e943370107d0..4974da1f620c 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -1923,10 +1923,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
> > >       ret = -EFAULT;
> > >       if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
> > >               goto out;
> > > -     features = uffdio_api.features;
> > > -     ret = -EINVAL;
> > > -     if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
> > > -             goto err_out;
> >
> > What's worse is that I think you removed the only UFFD_API check.  Although
> > I'm not sure whether it'll be extended in the future or not at all (very
> > possible we keep using 0xaa forever..), but removing this means we won't be
> > able to extend it to a new api version in the future, and misconfig of
> > uffdio_api will wrongly succeed I think:
> >
> >         /* Test wrong UFFD_API */
> >         uffdio_api.api = 0xab;
> >         uffdio_api.features = 0;
> >         if (ioctl(uffd, UFFDIO_API, &uffdio_api) == 0)
> >                 err("UFFDIO_API should fail but didn't");
> 
> Agreed, we should add back the UFFD_API check - I am happy to send a
> patch for this.

Do you plan to just revert the patch?  If so, please go ahead.  IMHO we
should just follow the man page.

What I agree here is the api isn't that perfect, in that we need to create
a separate userfault file descriptor just to probe.  Currently the features
will be returned in the initial test with features=0 passed in, but it also
initializes the uffd handle even if it'll never be used but for probe only.

However since that existed in the 1st day I guess we'd better keep it
as-is.  And it's not so bad either: user app does open/close one more time,
but only once for each app's lifecycle.

Thanks,
Axel Rasmussen March 28, 2023, 8:01 p.m. UTC | #4
On Tue, Mar 28, 2023 at 12:45 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Mar 28, 2023 at 12:28:59PM -0700, Axel Rasmussen wrote:
> > On Mon, Mar 27, 2023 at 2:01 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > I think I overlooked this patch..
> > >
> > > Axel, could you explain why this patch is correct?  Comments inline.
> > >
> > > On Fri, Jul 22, 2022 at 01:15:13PM -0700, Axel Rasmussen wrote:
> > > > The basic interaction for setting up a userfaultfd is, userspace issues
> > > > a UFFDIO_API ioctl, and passes in a set of zero or more feature flags,
> > > > indicating the features they would prefer to use.
> > > >
> > > > Of course, different kernels may support different sets of features
> > > > (depending on kernel version, kconfig options, architecture, etc).
> > > > Userspace's expectations may also not match: perhaps it was built
> > > > against newer kernel headers, which defined some features the kernel
> > > > it's running on doesn't support.
> > > >
> > > > Currently, if userspace passes in a flag we don't recognize, the
> > > > initialization fails and we return -EINVAL. This isn't great, though.
> > >
> > > Why?  IIUC that's the major way for user app to detect any misconfig of
> > > feature list so it can bail out early.
> > >
> > > Quoting from man page (ioctl_userfaultfd(2)):
> > >
> > > UFFDIO_API
> > >        (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API handshake.
> > >
> > >        ...
> > >
> > >            struct uffdio_api {
> > >                __u64 api;        /* Requested API version (input) */
> > >                __u64 features;   /* Requested features (input/output) */
> > >                __u64 ioctls;     /* Available ioctl() operations (output) */
> > >            };
> > >
> > >        ...
> > >
> > >        For Linux kernel versions before 4.11, the features field must be
> > >        initialized to zero before the call to UFFDIO_API, and zero (i.e.,
> > >        no feature bits) is placed in the features field by the kernel upon
> > >        return from ioctl(2).
> > >
> > >        ...
> > >
> > >        To enable userfaultfd features the application should set a bit
> > >        corresponding to each feature it wants to enable in the features
> > >        field.  If the kernel supports all the requested features it will
> > >        enable them.  Otherwise it will zero out the returned uffdio_api
> > >        structure and return EINVAL.
> > >
> > > IIUC the right way to use this API is first probe with features==0, then
> > > the kernel will return all the supported features, then the user app should
> > > enable only a subset (or all, but not a superset) of supported ones in the
> > > next UFFDIO_API with a new uffd.
> >
> > Hmm, I think doing a two-step handshake just overcomplicates things.
> >
> > Isn't it simpler to just have userspace ask for the features it wants
> > up front, and then the kernel responds with the subset of features it
> > actually supports? In the common case (all features were supported),
> > there is nothing more to do. Userspace is free to detect the uncommon
> > case where some features it asked for are missing, and handle that
> > however it likes.
> >
> > I think this patch is backwards compatible with the two-step approach, too.
> >
> > I do agree the man page could use some work. I don't think it
> > describes the two-step handshake process correctly, either. It just
> > says, "ask for the features you want, and the kernel will either give
> > them to you or fail". If we really did want to keep the two-step
> > process, it should describe it (set features == 0 first, then ask only
> > for the ones you want which are supported), and the example program
> > should demonstrate it.
> >
> > But, I think it's simpler to just have the kernel do what the man page
> > describes. Userspace asks for the features up front, kernel responds
> > with the subset that are actually supported. No need to return EINVAL
> > if unsupported features were requested.
>
> The uffdio_api.features passed into the ioctl(UFFDIO_API) should be such
> request to enable features specified in the kernel.  If the kernel doesn't
> support any of the features in the list, IMHO it's very natural to fail it
> as described in the man page.  That's also most of the kernel apis do
> afaik, by failing any enablement of features if not supported.
>
> >
> > >
> > > > Userspace doesn't have an obvious way to react to this; sure, one of the
> > > > features I asked for was unavailable, but which one? The only option it
> > > > has is to turn off things "at random" and hope something works.
> > > >
> > > > Instead, modify UFFDIO_API to just ignore any unrecognized feature
> > > > flags. The interaction is now that the initialization will succeed, and
> > > > as always we return the *subset* of feature flags that can actually be
> > > > used back to userspace.
> > > >
> > > > Now userspace has an obvious way to react: it checks if any flags it
> > > > asked for are missing. If so, it can conclude this kernel doesn't
> > > > support those, and it can either resign itself to not using them, or
> > > > fail with an error on its own, or whatever else.
> > > >
> > > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > > > ---
> > > >  fs/userfaultfd.c | 6 ++----
> > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > > index e943370107d0..4974da1f620c 100644
> > > > --- a/fs/userfaultfd.c
> > > > +++ b/fs/userfaultfd.c
> > > > @@ -1923,10 +1923,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
> > > >       ret = -EFAULT;
> > > >       if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
> > > >               goto out;
> > > > -     features = uffdio_api.features;
> > > > -     ret = -EINVAL;
> > > > -     if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
> > > > -             goto err_out;
> > >
> > > What's worse is that I think you removed the only UFFD_API check.  Although
> > > I'm not sure whether it'll be extended in the future or not at all (very
> > > possible we keep using 0xaa forever..), but removing this means we won't be
> > > able to extend it to a new api version in the future, and misconfig of
> > > uffdio_api will wrongly succeed I think:
> > >
> > >         /* Test wrong UFFD_API */
> > >         uffdio_api.api = 0xab;
> > >         uffdio_api.features = 0;
> > >         if (ioctl(uffd, UFFDIO_API, &uffdio_api) == 0)
> > >                 err("UFFDIO_API should fail but didn't");
> >
> > Agreed, we should add back the UFFD_API check - I am happy to send a
> > patch for this.
>
> Do you plan to just revert the patch?  If so, please go ahead.  IMHO we
> should just follow the man page.
>
> What I agree here is the api isn't that perfect, in that we need to create
> a separate userfault file descriptor just to probe.  Currently the features
> will be returned in the initial test with features=0 passed in, but it also
> initializes the uffd handle even if it'll never be used but for probe only.

Oh, I thought you could UFFDIO_API the same FD twice. Having to create
a whole separate FD just to probe features makes me dislike that
design even more.

>
> However since that existed in the 1st day I guess we'd better keep it
> as-is.  And it's not so bad either: user app does open/close one more time,
> but only once for each app's lifecycle.

I don't think just reverting would be enough. We'd also need to update
the man page to describe the two-step initialization, and we'd need to
update the man page's example program to demonstrate it. Our own
selftest also doesn't use that approach, so it would need to be
updated as well.

It also seems not unlikely that there exists some userspace code which
simply copied the example program from the man page, and as such
doesn't do the two-step handshake today. Hard to know for certain.

Once we've dealt with that, what we'll have accomplished is just
making the API harder to use. I don't see any downside from the
current state of things, it allows a much simpler way of configuring
userfaultfds, and it's backwards compatible with the more complicated
way.

I think we can set things right by just adding in the UFFD_API version
check by itself, and then updating the man page to describe the
current state of things?

>
> Thanks,
>
> --
> Peter Xu
>
Peter Xu March 28, 2023, 8:33 p.m. UTC | #5
On Tue, Mar 28, 2023 at 01:01:26PM -0700, Axel Rasmussen wrote:
> On Tue, Mar 28, 2023 at 12:45 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Mar 28, 2023 at 12:28:59PM -0700, Axel Rasmussen wrote:
> > > On Mon, Mar 27, 2023 at 2:01 PM Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > I think I overlooked this patch..
> > > >
> > > > Axel, could you explain why this patch is correct?  Comments inline.
> > > >
> > > > On Fri, Jul 22, 2022 at 01:15:13PM -0700, Axel Rasmussen wrote:
> > > > > The basic interaction for setting up a userfaultfd is, userspace issues
> > > > > a UFFDIO_API ioctl, and passes in a set of zero or more feature flags,
> > > > > indicating the features they would prefer to use.
> > > > >
> > > > > Of course, different kernels may support different sets of features
> > > > > (depending on kernel version, kconfig options, architecture, etc).
> > > > > Userspace's expectations may also not match: perhaps it was built
> > > > > against newer kernel headers, which defined some features the kernel
> > > > > it's running on doesn't support.
> > > > >
> > > > > Currently, if userspace passes in a flag we don't recognize, the
> > > > > initialization fails and we return -EINVAL. This isn't great, though.
> > > >
> > > > Why?  IIUC that's the major way for user app to detect any misconfig of
> > > > feature list so it can bail out early.
> > > >
> > > > Quoting from man page (ioctl_userfaultfd(2)):
> > > >
> > > > UFFDIO_API
> > > >        (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API handshake.
> > > >
> > > >        ...
> > > >
> > > >            struct uffdio_api {
> > > >                __u64 api;        /* Requested API version (input) */
> > > >                __u64 features;   /* Requested features (input/output) */
> > > >                __u64 ioctls;     /* Available ioctl() operations (output) */
> > > >            };
> > > >
> > > >        ...
> > > >
> > > >        For Linux kernel versions before 4.11, the features field must be
> > > >        initialized to zero before the call to UFFDIO_API, and zero (i.e.,
> > > >        no feature bits) is placed in the features field by the kernel upon
> > > >        return from ioctl(2).
> > > >
> > > >        ...
> > > >
> > > >        To enable userfaultfd features the application should set a bit
> > > >        corresponding to each feature it wants to enable in the features
> > > >        field.  If the kernel supports all the requested features it will
> > > >        enable them.  Otherwise it will zero out the returned uffdio_api
> > > >        structure and return EINVAL.
> > > >
> > > > IIUC the right way to use this API is first probe with features==0, then
> > > > the kernel will return all the supported features, then the user app should
> > > > enable only a subset (or all, but not a superset) of supported ones in the
> > > > next UFFDIO_API with a new uffd.
> > >
> > > Hmm, I think doing a two-step handshake just overcomplicates things.
> > >
> > > Isn't it simpler to just have userspace ask for the features it wants
> > > up front, and then the kernel responds with the subset of features it
> > > actually supports? In the common case (all features were supported),
> > > there is nothing more to do. Userspace is free to detect the uncommon
> > > case where some features it asked for are missing, and handle that
> > > however it likes.
> > >
> > > I think this patch is backwards compatible with the two-step approach, too.
> > >
> > > I do agree the man page could use some work. I don't think it
> > > describes the two-step handshake process correctly, either. It just
> > > says, "ask for the features you want, and the kernel will either give
> > > them to you or fail". If we really did want to keep the two-step
> > > process, it should describe it (set features == 0 first, then ask only
> > > for the ones you want which are supported), and the example program
> > > should demonstrate it.
> > >
> > > But, I think it's simpler to just have the kernel do what the man page
> > > describes. Userspace asks for the features up front, kernel responds
> > > with the subset that are actually supported. No need to return EINVAL
> > > if unsupported features were requested.
> >
> > The uffdio_api.features passed into the ioctl(UFFDIO_API) should be such
> > request to enable features specified in the kernel.  If the kernel doesn't
> > support any of the features in the list, IMHO it's very natural to fail it
> > as described in the man page.  That's also most of the kernel apis do
> > afaik, by failing any enablement of features if not supported.
> >
> > >
> > > >
> > > > > Userspace doesn't have an obvious way to react to this; sure, one of the
> > > > > features I asked for was unavailable, but which one? The only option it
> > > > > has is to turn off things "at random" and hope something works.
> > > > >
> > > > > Instead, modify UFFDIO_API to just ignore any unrecognized feature
> > > > > flags. The interaction is now that the initialization will succeed, and
> > > > > as always we return the *subset* of feature flags that can actually be
> > > > > used back to userspace.
> > > > >
> > > > > Now userspace has an obvious way to react: it checks if any flags it
> > > > > asked for are missing. If so, it can conclude this kernel doesn't
> > > > > support those, and it can either resign itself to not using them, or
> > > > > fail with an error on its own, or whatever else.
> > > > >
> > > > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > > > > ---
> > > > >  fs/userfaultfd.c | 6 ++----
> > > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > > > index e943370107d0..4974da1f620c 100644
> > > > > --- a/fs/userfaultfd.c
> > > > > +++ b/fs/userfaultfd.c
> > > > > @@ -1923,10 +1923,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
> > > > >       ret = -EFAULT;
> > > > >       if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
> > > > >               goto out;
> > > > > -     features = uffdio_api.features;
> > > > > -     ret = -EINVAL;
> > > > > -     if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
> > > > > -             goto err_out;
> > > >
> > > > What's worse is that I think you removed the only UFFD_API check.  Although
> > > > I'm not sure whether it'll be extended in the future or not at all (very
> > > > possible we keep using 0xaa forever..), but removing this means we won't be
> > > > able to extend it to a new api version in the future, and misconfig of
> > > > uffdio_api will wrongly succeed I think:
> > > >
> > > >         /* Test wrong UFFD_API */
> > > >         uffdio_api.api = 0xab;
> > > >         uffdio_api.features = 0;
> > > >         if (ioctl(uffd, UFFDIO_API, &uffdio_api) == 0)
> > > >                 err("UFFDIO_API should fail but didn't");
> > >
> > > Agreed, we should add back the UFFD_API check - I am happy to send a
> > > patch for this.
> >
> > Do you plan to just revert the patch?  If so, please go ahead.  IMHO we
> > should just follow the man page.
> >
> > What I agree here is the api isn't that perfect, in that we need to create
> > a separate userfault file descriptor just to probe.  Currently the features
> > will be returned in the initial test with features=0 passed in, but it also
> > initializes the uffd handle even if it'll never be used but for probe only.
> 
> Oh, I thought you could UFFDIO_API the same FD twice. Having to create
> a whole separate FD just to probe features makes me dislike that
> design even more.
> 
> >
> > However since that existed in the 1st day I guess we'd better keep it
> > as-is.  And it's not so bad either: user app does open/close one more time,
> > but only once for each app's lifecycle.
> 
> I don't think just reverting would be enough. We'd also need to update
> the man page to describe the two-step initialization, and we'd need to
> update the man page's example program to demonstrate it. Our own
> selftest also doesn't use that approach, so it would need to be
> updated as well.

No worry on that, I'm recently cleaning up the selftest (majorly, split
userfaultfd.c into two tests).  This is also on my radar, and yes it was
broken.  I do plan to make sure the selftests can run on all old/new
kernels after the cleanup.  It's getting a bit chaos by having so much
global variables and I found it becomes harder to maintain.

For this I blame myself on being lazy starting from the uffd-wp selftests,
though..  It can do better.

> 
> It also seems not unlikely that there exists some userspace code which
> simply copied the example program from the man page, and as such
> doesn't do the two-step handshake today. Hard to know for certain.

The example has no feature enabled, in which case is fine.  Definitely good
if there's another one illustrates the features!=0 case.

> 
> Once we've dealt with that, what we'll have accomplished is just
> making the API harder to use. I don't see any downside from the
> current state of things, it allows a much simpler way of configuring
> userfaultfds, and it's backwards compatible with the more complicated
> way.
> 
> I think we can set things right by just adding in the UFFD_API version
> check by itself, and then updating the man page to describe the
> current state of things?

I still don't understand why you would consider it's right only by having
the kernel succeed the ioctl even if some specified features are not
supported.  What's the benefit?

An user app will need to check the returned feature list and bit-check with
what was requested which is even more awkward to me than a straightforward
failure, isn't it?

QEMU definitely uses it with a proper probing:

https://gitlab.com/qemu-project/qemu/-/blob/master/migration/postcopy-ram.c#L222

Meanwhile anyone can try to enable FEATURE_NEVER_EXISTED and ioctl will
return 0.  It just doesn't sound right to me in any case..
Axel Rasmussen March 28, 2023, 9:52 p.m. UTC | #6
On Tue, Mar 28, 2023 at 1:33 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Mar 28, 2023 at 01:01:26PM -0700, Axel Rasmussen wrote:
> > On Tue, Mar 28, 2023 at 12:45 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Tue, Mar 28, 2023 at 12:28:59PM -0700, Axel Rasmussen wrote:
> > > > On Mon, Mar 27, 2023 at 2:01 PM Peter Xu <peterx@redhat.com> wrote:
> > > > >
> > > > > I think I overlooked this patch..
> > > > >
> > > > > Axel, could you explain why this patch is correct?  Comments inline.
> > > > >
> > > > > On Fri, Jul 22, 2022 at 01:15:13PM -0700, Axel Rasmussen wrote:
> > > > > > The basic interaction for setting up a userfaultfd is, userspace issues
> > > > > > a UFFDIO_API ioctl, and passes in a set of zero or more feature flags,
> > > > > > indicating the features they would prefer to use.
> > > > > >
> > > > > > Of course, different kernels may support different sets of features
> > > > > > (depending on kernel version, kconfig options, architecture, etc).
> > > > > > Userspace's expectations may also not match: perhaps it was built
> > > > > > against newer kernel headers, which defined some features the kernel
> > > > > > it's running on doesn't support.
> > > > > >
> > > > > > Currently, if userspace passes in a flag we don't recognize, the
> > > > > > initialization fails and we return -EINVAL. This isn't great, though.
> > > > >
> > > > > Why?  IIUC that's the major way for user app to detect any misconfig of
> > > > > feature list so it can bail out early.
> > > > >
> > > > > Quoting from man page (ioctl_userfaultfd(2)):
> > > > >
> > > > > UFFDIO_API
> > > > >        (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API handshake.
> > > > >
> > > > >        ...
> > > > >
> > > > >            struct uffdio_api {
> > > > >                __u64 api;        /* Requested API version (input) */
> > > > >                __u64 features;   /* Requested features (input/output) */
> > > > >                __u64 ioctls;     /* Available ioctl() operations (output) */
> > > > >            };
> > > > >
> > > > >        ...
> > > > >
> > > > >        For Linux kernel versions before 4.11, the features field must be
> > > > >        initialized to zero before the call to UFFDIO_API, and zero (i.e.,
> > > > >        no feature bits) is placed in the features field by the kernel upon
> > > > >        return from ioctl(2).
> > > > >
> > > > >        ...
> > > > >
> > > > >        To enable userfaultfd features the application should set a bit
> > > > >        corresponding to each feature it wants to enable in the features
> > > > >        field.  If the kernel supports all the requested features it will
> > > > >        enable them.  Otherwise it will zero out the returned uffdio_api
> > > > >        structure and return EINVAL.
> > > > >
> > > > > IIUC the right way to use this API is first probe with features==0, then
> > > > > the kernel will return all the supported features, then the user app should
> > > > > enable only a subset (or all, but not a superset) of supported ones in the
> > > > > next UFFDIO_API with a new uffd.
> > > >
> > > > Hmm, I think doing a two-step handshake just overcomplicates things.
> > > >
> > > > Isn't it simpler to just have userspace ask for the features it wants
> > > > up front, and then the kernel responds with the subset of features it
> > > > actually supports? In the common case (all features were supported),
> > > > there is nothing more to do. Userspace is free to detect the uncommon
> > > > case where some features it asked for are missing, and handle that
> > > > however it likes.
> > > >
> > > > I think this patch is backwards compatible with the two-step approach, too.
> > > >
> > > > I do agree the man page could use some work. I don't think it
> > > > describes the two-step handshake process correctly, either. It just
> > > > says, "ask for the features you want, and the kernel will either give
> > > > them to you or fail". If we really did want to keep the two-step
> > > > process, it should describe it (set features == 0 first, then ask only
> > > > for the ones you want which are supported), and the example program
> > > > should demonstrate it.
> > > >
> > > > But, I think it's simpler to just have the kernel do what the man page
> > > > describes. Userspace asks for the features up front, kernel responds
> > > > with the subset that are actually supported. No need to return EINVAL
> > > > if unsupported features were requested.
> > >
> > > The uffdio_api.features passed into the ioctl(UFFDIO_API) should be such
> > > request to enable features specified in the kernel.  If the kernel doesn't
> > > support any of the features in the list, IMHO it's very natural to fail it
> > > as described in the man page.  That's also most of the kernel apis do
> > > afaik, by failing any enablement of features if not supported.
> > >
> > > >
> > > > >
> > > > > > Userspace doesn't have an obvious way to react to this; sure, one of the
> > > > > > features I asked for was unavailable, but which one? The only option it
> > > > > > has is to turn off things "at random" and hope something works.
> > > > > >
> > > > > > Instead, modify UFFDIO_API to just ignore any unrecognized feature
> > > > > > flags. The interaction is now that the initialization will succeed, and
> > > > > > as always we return the *subset* of feature flags that can actually be
> > > > > > used back to userspace.
> > > > > >
> > > > > > Now userspace has an obvious way to react: it checks if any flags it
> > > > > > asked for are missing. If so, it can conclude this kernel doesn't
> > > > > > support those, and it can either resign itself to not using them, or
> > > > > > fail with an error on its own, or whatever else.
> > > > > >
> > > > > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > > > > > ---
> > > > > >  fs/userfaultfd.c | 6 ++----
> > > > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > > > > index e943370107d0..4974da1f620c 100644
> > > > > > --- a/fs/userfaultfd.c
> > > > > > +++ b/fs/userfaultfd.c
> > > > > > @@ -1923,10 +1923,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
> > > > > >       ret = -EFAULT;
> > > > > >       if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
> > > > > >               goto out;
> > > > > > -     features = uffdio_api.features;
> > > > > > -     ret = -EINVAL;
> > > > > > -     if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
> > > > > > -             goto err_out;
> > > > >
> > > > > What's worse is that I think you removed the only UFFD_API check.  Although
> > > > > I'm not sure whether it'll be extended in the future or not at all (very
> > > > > possible we keep using 0xaa forever..), but removing this means we won't be
> > > > > able to extend it to a new api version in the future, and misconfig of
> > > > > uffdio_api will wrongly succeed I think:
> > > > >
> > > > >         /* Test wrong UFFD_API */
> > > > >         uffdio_api.api = 0xab;
> > > > >         uffdio_api.features = 0;
> > > > >         if (ioctl(uffd, UFFDIO_API, &uffdio_api) == 0)
> > > > >                 err("UFFDIO_API should fail but didn't");
> > > >
> > > > Agreed, we should add back the UFFD_API check - I am happy to send a
> > > > patch for this.
> > >
> > > Do you plan to just revert the patch?  If so, please go ahead.  IMHO we
> > > should just follow the man page.
> > >
> > > What I agree here is the api isn't that perfect, in that we need to create
> > > a separate userfault file descriptor just to probe.  Currently the features
> > > will be returned in the initial test with features=0 passed in, but it also
> > > initializes the uffd handle even if it'll never be used but for probe only.
> >
> > Oh, I thought you could UFFDIO_API the same FD twice. Having to create
> > a whole separate FD just to probe features makes me dislike that
> > design even more.
> >
> > >
> > > However since that existed in the 1st day I guess we'd better keep it
> > > as-is.  And it's not so bad either: user app does open/close one more time,
> > > but only once for each app's lifecycle.
> >
> > I don't think just reverting would be enough. We'd also need to update
> > the man page to describe the two-step initialization, and we'd need to
> > update the man page's example program to demonstrate it. Our own
> > selftest also doesn't use that approach, so it would need to be
> > updated as well.
>
> No worry on that, I'm recently cleaning up the selftest (majorly, split
> userfaultfd.c into two tests).  This is also on my radar, and yes it was
> broken.  I do plan to make sure the selftests can run on all old/new
> kernels after the cleanup.  It's getting a bit chaos by having so much
> global variables and I found it becomes harder to maintain.
>
> For this I blame myself on being lazy starting from the uffd-wp selftests,
> though..  It can do better.
>
> >
> > It also seems not unlikely that there exists some userspace code which
> > simply copied the example program from the man page, and as such
> > doesn't do the two-step handshake today. Hard to know for certain.
>
> The example has no feature enabled, in which case is fine.  Definitely good
> if there's another one illustrates the features!=0 case.
>
> >
> > Once we've dealt with that, what we'll have accomplished is just
> > making the API harder to use. I don't see any downside from the
> > current state of things, it allows a much simpler way of configuring
> > userfaultfds, and it's backwards compatible with the more complicated
> > way.
> >
> > I think we can set things right by just adding in the UFFD_API version
> > check by itself, and then updating the man page to describe the
> > current state of things?
>
> I still don't understand why you would consider it's right only by having
> the kernel succeed the ioctl even if some specified features are not
> supported.  What's the benefit?

For me the clear benefit is just that it's simpler to use. With this
way, userspace only has to open + UFFDIO_API a userfaultfd once,
instead of twice.

This also means the example in the man page can be simpler, and our
selftest can be simpler.

I don't see being very strict here as useful. Another example might be
madvise() - for example trying to MADV_PAGEOUT on a kernel that
doesn't support it. There is no way the kernel can proceed here, since
it simply doesn't know how to do what you're asking for. In this case
an error makes sense.

In the userfaultfd case, these are optional features, and userfaultfds
are generally usable without them. If userspace asks for a feature and
it isn't available, it seems fairly likely userspace could degrade
gracefully, and just use the userfaultfd in a slightly different way
to compensate. Of course, userspace is free to consider this case
fatal if it prefers.

I think we should look at it the other way around. Let's prefer the
simpler approach, unless there is a clear benefit to the more complex
two-step handshake approach? Does the two-step handshake support a
case the simpler approach doesn't?

>
> An user app will need to check the returned feature list and bit-check with
> what was requested which is even more awkward to me than a straightforward
> failure, isn't it?

I don't see this as a downside, because it has to be done in either
design. Either we have to check the list of features after the first
(of two) handshake API ioctls, or we have to check them after our one
API ioctl where we requested a list of features.


>
> QEMU definitely uses it with a proper probing:
>
> https://gitlab.com/qemu-project/qemu/-/blob/master/migration/postcopy-ram.c#L222
>
> Meanwhile anyone can try to enable FEATURE_NEVER_EXISTED and ioctl will
> return 0.  It just doesn't sound right to me in any case..
>
> --
> Peter Xu
>
Peter Xu March 28, 2023, 10:34 p.m. UTC | #7
On Tue, Mar 28, 2023 at 02:52:35PM -0700, Axel Rasmussen wrote:
> I don't see being very strict here as useful. Another example might be
> madvise() - for example trying to MADV_PAGEOUT on a kernel that
> doesn't support it. There is no way the kernel can proceed here, since
> it simply doesn't know how to do what you're asking for. In this case
> an error makes sense.

IMHO, PAGEOUT is not a great example.  I wished we can have a way to probe
what madvise() the system supports, and I know many people wanted that too.
I even had a feeling that we'll have it some day.

So now I'm going back to look at this patch assuming I'm reviewing it, I'm
still not convinced the old API needs changing.

Userfaultfd allows probing with features=0 with/without this patch, so I
see this patch as something that doesn't bring a direct functional benefit,
but some kind of api change due to subjective preferences which I cannot
say right or wrong.  Now the patch is already merged.  If we need to change
either this patch or the man page to make them match again, again I'd
prefer we simply revert it to keep everything like before and copy stable.

Thanks,
Axel Rasmussen March 29, 2023, 5:53 p.m. UTC | #8
On Tue, Mar 28, 2023 at 3:34 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Mar 28, 2023 at 02:52:35PM -0700, Axel Rasmussen wrote:
> > I don't see being very strict here as useful. Another example might be
> > madvise() - for example trying to MADV_PAGEOUT on a kernel that
> > doesn't support it. There is no way the kernel can proceed here, since
> > it simply doesn't know how to do what you're asking for. In this case
> > an error makes sense.
>
> IMHO, PAGEOUT is not a great example.  I wished we can have a way to probe
> what madvise() the system supports, and I know many people wanted that too.
> I even had a feeling that we'll have it some day.
>
> So now I'm going back to look at this patch assuming I'm reviewing it, I'm
> still not convinced the old API needs changing.
>
> Userfaultfd allows probing with features=0 with/without this patch, so I
> see this patch as something that doesn't bring a direct functional benefit,

The benefit is we combine probing for features and creating a
userfaultfd into a single step, so userspace doesn't have to open +
manipulate a userfaultfd twice. In my mind, both approaches achieve
the same thing, it's just that one requires extra steps to get there.

To me, it's still unclear why there is any harm in supporting the
simpler way? And, I also don't see any way in which the more complex
way is better?

> but some kind of api change due to subjective preferences which I cannot
> say right or wrong.  Now the patch is already merged.  If we need to change
> either this patch or the man page to make them match again, again I'd
> prefer we simply revert it to keep everything like before and copy stable.

I think we need to change documentation either way. But, I think the
changes needed are actually bigger if we want to revert.

With the simpler behavior, the selftest and the example program in the
man page are ~correct as-is; otherwise we would need to modify those
to use the two-step probing method.

(By the way, I am excited about the selftest refactoring you talked
about! Thanks for doing that work. It definitely needs it, the
complexity there has gotten significantly worse as we've added more
things onto it [wp, minor faults].)

I think the man page description of how to use the API is incomplete
in either case. Right now it sort of alludes to the fact that you can
probe with features==0, but it doesn't explicitly say "you need to
probe first, then close that userfaultfd and open the real one you
want to use, with a subset of the features reported in the first
step". If we want to keep the old behavior, it should be more explicit
about the steps needed to get a userfaultfd.

You are right that it also doesn't describe "you can just ask for what
you want, and the kernel tells you what subset it can give you; you
need to check that the reported features are acceptable" - the new
behavior. That should be updated.

>
> Thanks,
>
> --
> Peter Xu
>
Peter Xu March 29, 2023, 7:41 p.m. UTC | #9
On Wed, Mar 29, 2023 at 10:53:38AM -0700, Axel Rasmussen wrote:
> On Tue, Mar 28, 2023 at 3:34 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Mar 28, 2023 at 02:52:35PM -0700, Axel Rasmussen wrote:
> > > I don't see being very strict here as useful. Another example might be
> > > madvise() - for example trying to MADV_PAGEOUT on a kernel that
> > > doesn't support it. There is no way the kernel can proceed here, since
> > > it simply doesn't know how to do what you're asking for. In this case
> > > an error makes sense.
> >
> > IMHO, PAGEOUT is not a great example.  I wished we can have a way to probe
> > what madvise() the system supports, and I know many people wanted that too.
> > I even had a feeling that we'll have it some day.
> >
> > So now I'm going back to look at this patch assuming I'm reviewing it, I'm
> > still not convinced the old API needs changing.
> >
> > Userfaultfd allows probing with features=0 with/without this patch, so I
> > see this patch as something that doesn't bring a direct functional benefit,
> 
> The benefit is we combine probing for features and creating a
> userfaultfd into a single step, so userspace doesn't have to open +
> manipulate a userfaultfd twice. In my mind, both approaches achieve
> the same thing, it's just that one requires extra steps to get there.
> 
> To me, it's still unclear why there is any harm in supporting the
> simpler way? And, I also don't see any way in which the more complex
> way is better?

Because that's what the man page says? :)

> 
> > but some kind of api change due to subjective preferences which I cannot
> > say right or wrong.  Now the patch is already merged.  If we need to change
> > either this patch or the man page to make them match again, again I'd
> > prefer we simply revert it to keep everything like before and copy stable.
> 
> I think we need to change documentation either way. But, I think the
> changes needed are actually bigger if we want to revert.

IIUC the man page doesn't need to update if we revert this patch.

The man page described clearly on what will happen if we pass in feature
bits that are not supported:

       To enable userfaultfd features the application should set a bit
       corresponding to each feature it wants to enable in the features
       field.  If the kernel supports all the requested features it will
       enable them.  Otherwise it will zero out the returned uffdio_api
       structure and return EINVAL.

> With the simpler behavior, the selftest and the example program in the
> man page are ~correct as-is; otherwise we would need to modify those
> to use the two-step probing method.
> 
> (By the way, I am excited about the selftest refactoring you talked
> about! Thanks for doing that work. It definitely needs it, the
> complexity there has gotten significantly worse as we've added more
> things onto it [wp, minor faults].)

I'll definitely copy you when I post it.  It growed a bit larger than I
thought, it'll be great if you can help have a look.

In the test cases I added an UFFDIO_API test to be the 1st one and that's
why I found this issue.  To let all tests pass currently I'll need to
revert this patch.  If you want we can move the discussion there when I
post it, I think that may need to be the 1st patch for the test suite
change and to let current test suite pass.

> I think the man page description of how to use the API is incomplete
> in either case. Right now it sort of alludes to the fact that you can
> probe with features==0, but it doesn't explicitly say "you need to
> probe first, then close that userfaultfd and open the real one you
> want to use, with a subset of the features reported in the first
> step". If we want to keep the old behavior, it should be more explicit
> about the steps needed to get a userfaultfd.

To tell the truth, if I'm going to change the API anyway, I'll simply add a
UFFDIO_FEATURES ioctl() returning the supported features, that'll be much,
much easier than either the old one or the one this patch proposed, IMHO.
Then we keep all the rest untouched.  That should work perfectly and
that'll not require open()/close() duplications either.

But as I mentioned before, I don't think UFFDIO_FEATURES justifies itself
either much on being worthwhile because it introduces a new ioctl without
any major benefit.  At least we'll need to keep the old behavior still
working.

> 
> You are right that it also doesn't describe "you can just ask for what
> you want, and the kernel tells you what subset it can give you; you
> need to check that the reported features are acceptable" - the new
> behavior. That should be updated.
diff mbox series

Patch

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index e943370107d0..4974da1f620c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1923,10 +1923,8 @@  static int userfaultfd_api(struct userfaultfd_ctx *ctx,
 	ret = -EFAULT;
 	if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
 		goto out;
-	features = uffdio_api.features;
-	ret = -EINVAL;
-	if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
-		goto err_out;
+	/* Ignore unsupported features (userspace built against newer kernel) */
+	features = uffdio_api.features & UFFD_API_FEATURES;
 	ret = -EPERM;
 	if ((features & UFFD_FEATURE_EVENT_FORK) && !capable(CAP_SYS_PTRACE))
 		goto err_out;