diff mbox

[1/2] vfs: implement fchmodat2() syscall

Message ID 148830142269.7103.7429913851447595016.stgit@bahia (mailing list archive)
State New, archived
Headers show

Commit Message

Greg Kurz Feb. 28, 2017, 5:03 p.m. UTC
According to the POSIX.1-2008 manual page [1], the fchmodat() function has
a flag argument which may be passed the following value:

AT_SYMLINK_NOFOLLOW
    If path names a symbolic link, then the mode of the symbolic link is
    changed.

and the following error may be returned:

[EOPNOTSUPP]
    The AT_SYMLINK_NOFOLLOW bit is set in the flag argument, path names a
    symbolic link, and the system does not support changing the mode of a
    symbolic link.

The linux kernel doesn't support changing the mode of a symbolic link, but
the current implementation doesn't even have a flag argument. It is then
up to userspace to deal with that. Unfortunately, it is impossible to
implement the POSIX behavior in a race-free manner.

This patch introduces a new fchmodat2() syscall with a flag argument to
address the issue.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 fs/open.c                         |   23 +++++++++++++++++++----
 include/linux/syscalls.h          |    2 ++
 include/uapi/asm-generic/unistd.h |    4 +++-
 scripts/checksyscalls.sh          |    3 ++-
 4 files changed, 26 insertions(+), 6 deletions(-)

Comments

Eric Blake Feb. 28, 2017, 6:23 p.m. UTC | #1
On 02/28/2017 11:03 AM, Greg Kurz wrote:
> According to the POSIX.1-2008 manual page [1], the fchmodat() function has
> a flag argument which may be passed the following value:
> 
> AT_SYMLINK_NOFOLLOW
>     If path names a symbolic link, then the mode of the symbolic link is
>     changed.
> 
> and the following error may be returned:
> 
> [EOPNOTSUPP]
>     The AT_SYMLINK_NOFOLLOW bit is set in the flag argument, path names a
>     symbolic link, and the system does not support changing the mode of a
>     symbolic link.
> 
> The linux kernel doesn't support changing the mode of a symbolic link, but
> the current implementation doesn't even have a flag argument. It is then
> up to userspace to deal with that. Unfortunately, it is impossible to
> implement the POSIX behavior in a race-free manner.
> 
> This patch introduces a new fchmodat2() syscall with a flag argument to
> address the issue.
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---

Might also be worth mentioning that this patch is required in order to
solve CVE-2016-9602, per discussion at
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg06089.html

> +++ b/include/linux/syscalls.h
> @@ -775,6 +775,8 @@ asmlinkage long sys_futimesat(int dfd, const char __user *filename,
>  asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode);
>  asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
>  			     umode_t mode);
> +asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
> +			      umode_t mode, int flag);
>  asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
>  			     gid_t group, int flag);

Is the indentation off here?

Reviewed-by: Eric Blake <eblake@redhat.com>
Greg Kurz Feb. 28, 2017, 6:41 p.m. UTC | #2
On Tue, 28 Feb 2017 12:23:01 -0600
Eric Blake <eblake@redhat.com> wrote:

> On 02/28/2017 11:03 AM, Greg Kurz wrote:
> > According to the POSIX.1-2008 manual page [1], the fchmodat() function has
> > a flag argument which may be passed the following value:
> > 
> > AT_SYMLINK_NOFOLLOW
> >     If path names a symbolic link, then the mode of the symbolic link is
> >     changed.
> > 
> > and the following error may be returned:
> > 
> > [EOPNOTSUPP]
> >     The AT_SYMLINK_NOFOLLOW bit is set in the flag argument, path names a
> >     symbolic link, and the system does not support changing the mode of a
> >     symbolic link.
> > 
> > The linux kernel doesn't support changing the mode of a symbolic link, but
> > the current implementation doesn't even have a flag argument. It is then
> > up to userspace to deal with that. Unfortunately, it is impossible to
> > implement the POSIX behavior in a race-free manner.
> > 
> > This patch introduces a new fchmodat2() syscall with a flag argument to
> > address the issue.
> > 
> > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---  
> 
> Might also be worth mentioning that this patch is required in order to
> solve CVE-2016-9602, per discussion at
> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg06089.html
> 

True. I'll add a reference to it if I have to send a v2.

> > +++ b/include/linux/syscalls.h
> > @@ -775,6 +775,8 @@ asmlinkage long sys_futimesat(int dfd, const char __user *filename,
> >  asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode);
> >  asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
> >  			     umode_t mode);
> > +asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
> > +			      umode_t mode, int flag);
> >  asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
> >  			     gid_t group, int flag);  
> 
> Is the indentation off here?
> 

This is linux style indent with tabs+spaces. FWIW it is displayed correctly
in vi and emacs (I've simply copied the sys_fchmodat() declaration).

> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
>
Eric Blake Feb. 28, 2017, 6:59 p.m. UTC | #3
On 02/28/2017 12:41 PM, Greg Kurz wrote:

>>> +++ b/include/linux/syscalls.h
>>> @@ -775,6 +775,8 @@ asmlinkage long sys_futimesat(int dfd, const char __user *filename,
>>>  asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode);
>>>  asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
>>>  			     umode_t mode);
>>> +asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
>>> +			      umode_t mode, int flag);
>>>  asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
>>>  			     gid_t group, int flag);  
>>
>> Is the indentation off here?
>>
> 
> This is linux style indent with tabs+spaces. FWIW it is displayed correctly
> in vi and emacs (I've simply copied the sys_fchmodat() declaration).

Sorry for the noise; I see that it is correct now, since fchmodat2 is
one bye longer than fchmodat or fchownat.  Sometimes, I wish I could
convince my mailer to display leading tabs as exactly 8 spaces rather
than to the next 8-space tab stop, so that prefixed characters that
aren't tab-stop aligned don't mess with the actual visual output.
Michael Kerrisk (man-pages) March 1, 2017, 9:01 a.m. UTC | #4
[CC += linux-api@vger.kernel.org]

Hello Greg,

Since this is a kernel-user-space API change, please CC linux-api@.
The kernel source file Documentation/SubmitChecklist notes that all
Linux kernel patches that change userspace interfaces should be CCed
to linux-api@vger.kernel.org, so that the various parties who are
interested in API changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html

Thanks,

Michael


On Tue, Feb 28, 2017 at 6:03 PM, Greg Kurz <groug@kaod.org> wrote:
> According to the POSIX.1-2008 manual page [1], the fchmodat() function has
> a flag argument which may be passed the following value:
>
> AT_SYMLINK_NOFOLLOW
>     If path names a symbolic link, then the mode of the symbolic link is
>     changed.
>
> and the following error may be returned:
>
> [EOPNOTSUPP]
>     The AT_SYMLINK_NOFOLLOW bit is set in the flag argument, path names a
>     symbolic link, and the system does not support changing the mode of a
>     symbolic link.
>
> The linux kernel doesn't support changing the mode of a symbolic link, but
> the current implementation doesn't even have a flag argument. It is then
> up to userspace to deal with that. Unfortunately, it is impossible to
> implement the POSIX behavior in a race-free manner.
>
> This patch introduces a new fchmodat2() syscall with a flag argument to
> address the issue.
>
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html
>
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  fs/open.c                         |   23 +++++++++++++++++++----
>  include/linux/syscalls.h          |    2 ++
>  include/uapi/asm-generic/unistd.h |    4 +++-
>  scripts/checksyscalls.sh          |    3 ++-
>  4 files changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index 9921f70bc5ca..66a8c19f72ca 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -558,24 +558,39 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
>         return err;
>  }
>
> -SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t, mode)
> +SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename, umode_t,
> +               mode, int, flag)
>  {
>         struct path path;
> -       int error;
> -       unsigned int lookup_flags = LOOKUP_FOLLOW;
> +       int error = -EINVAL;
> +       unsigned int lookup_flags;
> +
> +       if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
> +               goto out;
> +
> +       lookup_flags = (flag & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
>  retry:
>         error = user_path_at(dfd, filename, lookup_flags, &path);
>         if (!error) {
> -               error = chmod_common(&path, mode);
> +               error = -EOPNOTSUPP;
> +               if (!d_is_symlink(path.dentry))
> +                       error = chmod_common(&path, mode);
>                 path_put(&path);
>                 if (retry_estale(error, lookup_flags)) {
>                         lookup_flags |= LOOKUP_REVAL;
>                         goto retry;
>                 }
>         }
> +out:
>         return error;
>  }
>
> +SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t,
> +               mode)
> +{
> +       return sys_fchmodat2(dfd, filename, mode, 0);
> +}
> +
>  SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
>  {
>         return sys_fchmodat(AT_FDCWD, filename, mode);
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 91a740f6b884..982089d55b31 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -775,6 +775,8 @@ asmlinkage long sys_futimesat(int dfd, const char __user *filename,
>  asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode);
>  asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
>                              umode_t mode);
> +asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
> +                             umode_t mode, int flag);
>  asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
>                              gid_t group, int flag);
>  asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 9b1462e38b82..e8b0a00908b1 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -730,9 +730,11 @@ __SYSCALL(__NR_pkey_mprotect, sys_pkey_mprotect)
>  __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
>  #define __NR_pkey_free 290
>  __SYSCALL(__NR_pkey_free,     sys_pkey_free)
> +#define __NR_fchmodat2 291
> +__SYSCALL(__NR_fchmodat2,     sys_fchmodat2)
>
>  #undef __NR_syscalls
> -#define __NR_syscalls 291
> +#define __NR_syscalls 292
>
>  /*
>   * All syscalls below here should go away really,
> diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
> index 2c9082ba6137..2e7471a1d308 100755
> --- a/scripts/checksyscalls.sh
> +++ b/scripts/checksyscalls.sh
> @@ -19,7 +19,7 @@ cat << EOF
>  #define __IGNORE_link          /* linkat */
>  #define __IGNORE_unlink                /* unlinkat */
>  #define __IGNORE_mknod         /* mknodat */
> -#define __IGNORE_chmod         /* fchmodat */
> +#define __IGNORE_chmod         /* fchmodat2 */
>  #define __IGNORE_chown         /* fchownat */
>  #define __IGNORE_mkdir         /* mkdirat */
>  #define __IGNORE_rmdir         /* unlinkat */
> @@ -39,6 +39,7 @@ cat << EOF
>
>  /* Missing flags argument */
>  #define __IGNORE_renameat      /* renameat2 */
> +#define __IGNORE_fchmodat      /* fchmodat2 */
>
>  /* CLOEXEC flag */
>  #define __IGNORE_pipe          /* pipe2 */
>
Greg Kurz April 11, 2017, 11:39 a.m. UTC | #5
On Wed, 1 Mar 2017 10:01:53 +0100
Michael Kerrisk <mtk.manpages@gmail.com> wrote:

> [CC += linux-api@vger.kernel.org]
> 
> Hello Greg,
> 
> Since this is a kernel-user-space API change, please CC linux-api@.
> The kernel source file Documentation/SubmitChecklist notes that all
> Linux kernel patches that change userspace interfaces should be CCed
> to linux-api@vger.kernel.org, so that the various parties who are
> interested in API changes are informed. For further information, see
> https://www.kernel.org/doc/man-pages/linux-api-ml.html
> 
> Thanks,
> 
> Michael
> 

My bad :-\ 

Anyway, since I had no feedback so far, I'll send a v2 and this time I'll
rembember to Cc linux-api@vger.kernel.org.

Thanks,

--
Greg

> 
> On Tue, Feb 28, 2017 at 6:03 PM, Greg Kurz <groug@kaod.org> wrote:
> > According to the POSIX.1-2008 manual page [1], the fchmodat() function has
> > a flag argument which may be passed the following value:
> >
> > AT_SYMLINK_NOFOLLOW
> >     If path names a symbolic link, then the mode of the symbolic link is
> >     changed.
> >
> > and the following error may be returned:
> >
> > [EOPNOTSUPP]
> >     The AT_SYMLINK_NOFOLLOW bit is set in the flag argument, path names a
> >     symbolic link, and the system does not support changing the mode of a
> >     symbolic link.
> >
> > The linux kernel doesn't support changing the mode of a symbolic link, but
> > the current implementation doesn't even have a flag argument. It is then
> > up to userspace to deal with that. Unfortunately, it is impossible to
> > implement the POSIX behavior in a race-free manner.
> >
> > This patch introduces a new fchmodat2() syscall with a flag argument to
> > address the issue.
> >
> > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html
> >
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  fs/open.c                         |   23 +++++++++++++++++++----
> >  include/linux/syscalls.h          |    2 ++
> >  include/uapi/asm-generic/unistd.h |    4 +++-
> >  scripts/checksyscalls.sh          |    3 ++-
> >  4 files changed, 26 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/open.c b/fs/open.c
> > index 9921f70bc5ca..66a8c19f72ca 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -558,24 +558,39 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
> >         return err;
> >  }
> >
> > -SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t, mode)
> > +SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename, umode_t,
> > +               mode, int, flag)
> >  {
> >         struct path path;
> > -       int error;
> > -       unsigned int lookup_flags = LOOKUP_FOLLOW;
> > +       int error = -EINVAL;
> > +       unsigned int lookup_flags;
> > +
> > +       if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
> > +               goto out;
> > +
> > +       lookup_flags = (flag & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
> >  retry:
> >         error = user_path_at(dfd, filename, lookup_flags, &path);
> >         if (!error) {
> > -               error = chmod_common(&path, mode);
> > +               error = -EOPNOTSUPP;
> > +               if (!d_is_symlink(path.dentry))
> > +                       error = chmod_common(&path, mode);
> >                 path_put(&path);
> >                 if (retry_estale(error, lookup_flags)) {
> >                         lookup_flags |= LOOKUP_REVAL;
> >                         goto retry;
> >                 }
> >         }
> > +out:
> >         return error;
> >  }
> >
> > +SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t,
> > +               mode)
> > +{
> > +       return sys_fchmodat2(dfd, filename, mode, 0);
> > +}
> > +
> >  SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
> >  {
> >         return sys_fchmodat(AT_FDCWD, filename, mode);
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 91a740f6b884..982089d55b31 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -775,6 +775,8 @@ asmlinkage long sys_futimesat(int dfd, const char __user *filename,
> >  asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode);
> >  asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
> >                              umode_t mode);
> > +asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
> > +                             umode_t mode, int flag);
> >  asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
> >                              gid_t group, int flag);
> >  asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
> > diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> > index 9b1462e38b82..e8b0a00908b1 100644
> > --- a/include/uapi/asm-generic/unistd.h
> > +++ b/include/uapi/asm-generic/unistd.h
> > @@ -730,9 +730,11 @@ __SYSCALL(__NR_pkey_mprotect, sys_pkey_mprotect)
> >  __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
> >  #define __NR_pkey_free 290
> >  __SYSCALL(__NR_pkey_free,     sys_pkey_free)
> > +#define __NR_fchmodat2 291
> > +__SYSCALL(__NR_fchmodat2,     sys_fchmodat2)
> >
> >  #undef __NR_syscalls
> > -#define __NR_syscalls 291
> > +#define __NR_syscalls 292
> >
> >  /*
> >   * All syscalls below here should go away really,
> > diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
> > index 2c9082ba6137..2e7471a1d308 100755
> > --- a/scripts/checksyscalls.sh
> > +++ b/scripts/checksyscalls.sh
> > @@ -19,7 +19,7 @@ cat << EOF
> >  #define __IGNORE_link          /* linkat */
> >  #define __IGNORE_unlink                /* unlinkat */
> >  #define __IGNORE_mknod         /* mknodat */
> > -#define __IGNORE_chmod         /* fchmodat */
> > +#define __IGNORE_chmod         /* fchmodat2 */
> >  #define __IGNORE_chown         /* fchownat */
> >  #define __IGNORE_mkdir         /* mkdirat */
> >  #define __IGNORE_rmdir         /* unlinkat */
> > @@ -39,6 +39,7 @@ cat << EOF
> >
> >  /* Missing flags argument */
> >  #define __IGNORE_renameat      /* renameat2 */
> > +#define __IGNORE_fchmodat      /* fchmodat2 */
> >
> >  /* CLOEXEC flag */
> >  #define __IGNORE_pipe          /* pipe2 */
> >  
> 
> 
>
Greg Kurz April 11, 2017, 12:04 p.m. UTC | #6
On Tue, 11 Apr 2017 13:39:37 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Wed, 1 Mar 2017 10:01:53 +0100
> Michael Kerrisk <mtk.manpages@gmail.com> wrote:
> 
> > [CC += linux-api@vger.kernel.org]
> > 
> > Hello Greg,
> > 
> > Since this is a kernel-user-space API change, please CC linux-api@.
> > The kernel source file Documentation/SubmitChecklist notes that all
> > Linux kernel patches that change userspace interfaces should be CCed
> > to linux-api@vger.kernel.org, so that the various parties who are
> > interested in API changes are informed. For further information, see
> > https://www.kernel.org/doc/man-pages/linux-api-ml.html
> > 
> > Thanks,
> > 
> > Michael
> >   
> 
> My bad :-\ 
> 
> Anyway, since I had no feedback so far, I'll send a v2 and this time I'll
> rembember to Cc linux-api@vger.kernel.org.
> 
> Thanks,
> 

BTW, commit "186128f75392 docs-rst: add documents to development-process" moved
Documentation/SubmitChecklist to Documentation/process/submit-checklist.rst, but
the www.kernel.org page hasn't been updated to reflect that...

Cheers.

--
Greg

> --
> Greg
> 
> > 
> > On Tue, Feb 28, 2017 at 6:03 PM, Greg Kurz <groug@kaod.org> wrote:  
> > > According to the POSIX.1-2008 manual page [1], the fchmodat() function has
> > > a flag argument which may be passed the following value:
> > >
> > > AT_SYMLINK_NOFOLLOW
> > >     If path names a symbolic link, then the mode of the symbolic link is
> > >     changed.
> > >
> > > and the following error may be returned:
> > >
> > > [EOPNOTSUPP]
> > >     The AT_SYMLINK_NOFOLLOW bit is set in the flag argument, path names a
> > >     symbolic link, and the system does not support changing the mode of a
> > >     symbolic link.
> > >
> > > The linux kernel doesn't support changing the mode of a symbolic link, but
> > > the current implementation doesn't even have a flag argument. It is then
> > > up to userspace to deal with that. Unfortunately, it is impossible to
> > > implement the POSIX behavior in a race-free manner.
> > >
> > > This patch introduces a new fchmodat2() syscall with a flag argument to
> > > address the issue.
> > >
> > > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html
> > >
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > ---
> > >  fs/open.c                         |   23 +++++++++++++++++++----
> > >  include/linux/syscalls.h          |    2 ++
> > >  include/uapi/asm-generic/unistd.h |    4 +++-
> > >  scripts/checksyscalls.sh          |    3 ++-
> > >  4 files changed, 26 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/open.c b/fs/open.c
> > > index 9921f70bc5ca..66a8c19f72ca 100644
> > > --- a/fs/open.c
> > > +++ b/fs/open.c
> > > @@ -558,24 +558,39 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
> > >         return err;
> > >  }
> > >
> > > -SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t, mode)
> > > +SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename, umode_t,
> > > +               mode, int, flag)
> > >  {
> > >         struct path path;
> > > -       int error;
> > > -       unsigned int lookup_flags = LOOKUP_FOLLOW;
> > > +       int error = -EINVAL;
> > > +       unsigned int lookup_flags;
> > > +
> > > +       if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
> > > +               goto out;
> > > +
> > > +       lookup_flags = (flag & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
> > >  retry:
> > >         error = user_path_at(dfd, filename, lookup_flags, &path);
> > >         if (!error) {
> > > -               error = chmod_common(&path, mode);
> > > +               error = -EOPNOTSUPP;
> > > +               if (!d_is_symlink(path.dentry))
> > > +                       error = chmod_common(&path, mode);
> > >                 path_put(&path);
> > >                 if (retry_estale(error, lookup_flags)) {
> > >                         lookup_flags |= LOOKUP_REVAL;
> > >                         goto retry;
> > >                 }
> > >         }
> > > +out:
> > >         return error;
> > >  }
> > >
> > > +SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t,
> > > +               mode)
> > > +{
> > > +       return sys_fchmodat2(dfd, filename, mode, 0);
> > > +}
> > > +
> > >  SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
> > >  {
> > >         return sys_fchmodat(AT_FDCWD, filename, mode);
> > > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > > index 91a740f6b884..982089d55b31 100644
> > > --- a/include/linux/syscalls.h
> > > +++ b/include/linux/syscalls.h
> > > @@ -775,6 +775,8 @@ asmlinkage long sys_futimesat(int dfd, const char __user *filename,
> > >  asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode);
> > >  asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
> > >                              umode_t mode);
> > > +asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
> > > +                             umode_t mode, int flag);
> > >  asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
> > >                              gid_t group, int flag);
> > >  asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
> > > diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> > > index 9b1462e38b82..e8b0a00908b1 100644
> > > --- a/include/uapi/asm-generic/unistd.h
> > > +++ b/include/uapi/asm-generic/unistd.h
> > > @@ -730,9 +730,11 @@ __SYSCALL(__NR_pkey_mprotect, sys_pkey_mprotect)
> > >  __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
> > >  #define __NR_pkey_free 290
> > >  __SYSCALL(__NR_pkey_free,     sys_pkey_free)
> > > +#define __NR_fchmodat2 291
> > > +__SYSCALL(__NR_fchmodat2,     sys_fchmodat2)
> > >
> > >  #undef __NR_syscalls
> > > -#define __NR_syscalls 291
> > > +#define __NR_syscalls 292
> > >
> > >  /*
> > >   * All syscalls below here should go away really,
> > > diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
> > > index 2c9082ba6137..2e7471a1d308 100755
> > > --- a/scripts/checksyscalls.sh
> > > +++ b/scripts/checksyscalls.sh
> > > @@ -19,7 +19,7 @@ cat << EOF
> > >  #define __IGNORE_link          /* linkat */
> > >  #define __IGNORE_unlink                /* unlinkat */
> > >  #define __IGNORE_mknod         /* mknodat */
> > > -#define __IGNORE_chmod         /* fchmodat */
> > > +#define __IGNORE_chmod         /* fchmodat2 */
> > >  #define __IGNORE_chown         /* fchownat */
> > >  #define __IGNORE_mkdir         /* mkdirat */
> > >  #define __IGNORE_rmdir         /* unlinkat */
> > > @@ -39,6 +39,7 @@ cat << EOF
> > >
> > >  /* Missing flags argument */
> > >  #define __IGNORE_renameat      /* renameat2 */
> > > +#define __IGNORE_fchmodat      /* fchmodat2 */
> > >
> > >  /* CLOEXEC flag */
> > >  #define __IGNORE_pipe          /* pipe2 */
> > >    
> > 
> > 
> >   
>
Colin Walters April 11, 2017, 5:52 p.m. UTC | #7
On Tue, Feb 28, 2017, at 02:23 PM, Eric Blake wrote:

> Might also be worth mentioning that this patch is required in order to
> solve CVE-2016-9602, per discussion at
> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg06089.html

I only briefly looked at this, but can't `open(..., O_PATH)` be used to solve
this today?
Eric Blake April 11, 2017, 5:55 p.m. UTC | #8
On 04/11/2017 12:52 PM, Colin Walters wrote:
> 
> 
> On Tue, Feb 28, 2017, at 02:23 PM, Eric Blake wrote:
> 
>> Might also be worth mentioning that this patch is required in order to
>> solve CVE-2016-9602, per discussion at
>> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg06089.html
> 
> I only briefly looked at this, but can't `open(..., O_PATH)` be used to solve
> this today?

O_PATH was the fallback that qemu used - but that's non-POSIX, which
means we have to have a different solution for POSIX systems than for
Linux systems, while waiting for Linux to catch up to POSIX.
Eric Blake April 11, 2017, 6:07 p.m. UTC | #9
On 04/11/2017 12:55 PM, Eric Blake wrote:
> On 04/11/2017 12:52 PM, Colin Walters wrote:
>>
>>
>> On Tue, Feb 28, 2017, at 02:23 PM, Eric Blake wrote:
>>
>>> Might also be worth mentioning that this patch is required in order to
>>> solve CVE-2016-9602, per discussion at
>>> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg06089.html
>>
>> I only briefly looked at this, but can't `open(..., O_PATH)` be used to solve
>> this today?
> 
> O_PATH was the fallback that qemu used

Hmm - actually, qemu used O_PATH for the directory portion of *at
traversals:
git.qemu-project.org/?p=qemu.git;a=commitdiff;h=918112c

but did not use O_PATH for its chmod() fallback:
git.qemu-project.org/?p=qemu.git;a=commitdiff;h=e3187a4

A good idea on the surface. But reading the man page of openat(), the
section on O_PATH says:
   The  file
              itself  is not opened, and other file operations (e.g.,
read(2),
              write(2), fchmod(2), fchown(2), fgetxattr(2), mmap(2))
fail with
              the error EBADF.

> - but that's non-POSIX, which
> means we have to have a different solution for POSIX systems than for
> Linux systems, while waiting for Linux to catch up to POSIX.

But even if using open(O_PATH)/fchmod() works, it is not immediately
obvious whether it can catch all the same cases that chmodat(O_NOFOLLOW)
would cover, as there are cases where you have permissions to change
mode bits but not open() the file for reading or writing.  And even if
it gets rid of a TOCTTOU race, it still is a 2-syscall hit rather than
an atomic single syscall.
Eric Blake April 11, 2017, 6:08 p.m. UTC | #10
On 04/11/2017 01:07 PM, Eric Blake wrote:

> 
> But even if using open(O_PATH)/fchmod() works, it is not immediately
> obvious whether it can catch all the same cases that chmodat(O_NOFOLLOW)

Typo; I obviously meant fchmodat(AT_SYMLINK_NOFOLLOW)

> would cover, as there are cases where you have permissions to change
> mode bits but not open() the file for reading or writing.  And even if
> it gets rid of a TOCTTOU race, it still is a 2-syscall hit rather than
> an atomic single syscall.
>
Colin Walters April 11, 2017, 7:09 p.m. UTC | #11
On Tue, Apr 11, 2017, at 02:07 PM, Eric Blake wrote:
> 
> A good idea on the surface. But reading the man page of openat(), the
> section on O_PATH says:
>    The  file
>               itself  is not opened, and other file operations (e.g.,
> read(2),
>               write(2), fchmod(2), fchown(2), fgetxattr(2), mmap(2))
> fail with
>               the error EBADF.

Right, though more topically I'd have expected
fchmodat() (not fchmod()) to take AT_EMPTY_PATH,
just like fstatat() does.

But it doesn't appear to be supported...oh, even at
the syscall level, interesting.   Ah, I see, glibc does:

int
fchmodat (int fd, const char *file, mode_t mode, int flag)
{
  if (flag & ~AT_SYMLINK_NOFOLLOW)
    return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
...
}

And indeed the syscall doesn't have flags, bringing us back
to the start here.   Sorry, that seems obvious in retrospect,
but I was "working forwards" from the O_PATH userspace API
mindset.
Greg Kurz April 19, 2017, 12:45 p.m. UTC | #12
On Tue, 11 Apr 2017 15:09:40 -0400
Colin Walters <walters@verbum.org> wrote:

> On Tue, Apr 11, 2017, at 02:07 PM, Eric Blake wrote:
> > 
> > A good idea on the surface. But reading the man page of openat(), the
> > section on O_PATH says:
> >    The  file
> >               itself  is not opened, and other file operations (e.g.,
> > read(2),
> >               write(2), fchmod(2), fchown(2), fgetxattr(2), mmap(2))
> > fail with
> >               the error EBADF.  
> 
> Right, though more topically I'd have expected
> fchmodat() (not fchmod()) to take AT_EMPTY_PATH,
> just like fstatat() does.
> 

Like Eric said in another mail, this would still require to open() the file
first... ie, we cannot change mode if initial bits are 0000, whereas it
succeeds with chmod().

> But it doesn't appear to be supported...oh, even at
> the syscall level, interesting.   Ah, I see, glibc does:
> 
> int
> fchmodat (int fd, const char *file, mode_t mode, int flag)
> {
>   if (flag & ~AT_SYMLINK_NOFOLLOW)
>     return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
> ...
> }
> 
> And indeed the syscall doesn't have flags, bringing us back
> to the start here.   Sorry, that seems obvious in retrospect,
> but I was "working forwards" from the O_PATH userspace API
> mindset.
> 
> 

The use case is to fix CVE-2016-9602 in QEMU. We need to be able to change
the mode bits of a file that resides under a specific directory, which is
shared between the host and the guest.

Since untrusted code in a guest can create symlinks, we need to be sure
that the file isn't a symlink, otherwise the mode bit change could affect
an arbitrary file not residing under the shared directory.

This could be handled with chroot() or unshare()+chdir() but this isn't an
option because we want this to work even if QEMU is unprivileged.

According to POSIX, this is exactly how fchmodat(AT_SYMLINK_NOFOLLOW) should
behave on Linux:

[EOPNOTSUPP]
    The AT_SYMLINK_NOFOLLOW bit is set in the flag argument, path names a
    symbolic link, and the system does not support changing the mode of a
    symbolic link.

I hope this is clear enough.

--
Greg
diff mbox

Patch

diff --git a/fs/open.c b/fs/open.c
index 9921f70bc5ca..66a8c19f72ca 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -558,24 +558,39 @@  SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
 	return err;
 }
 
-SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t, mode)
+SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename, umode_t,
+		mode, int, flag)
 {
 	struct path path;
-	int error;
-	unsigned int lookup_flags = LOOKUP_FOLLOW;
+	int error = -EINVAL;
+	unsigned int lookup_flags;
+
+	if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
+		goto out;
+
+	lookup_flags = (flag & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
 retry:
 	error = user_path_at(dfd, filename, lookup_flags, &path);
 	if (!error) {
-		error = chmod_common(&path, mode);
+		error = -EOPNOTSUPP;
+		if (!d_is_symlink(path.dentry))
+			error = chmod_common(&path, mode);
 		path_put(&path);
 		if (retry_estale(error, lookup_flags)) {
 			lookup_flags |= LOOKUP_REVAL;
 			goto retry;
 		}
 	}
+out:
 	return error;
 }
 
+SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t,
+		mode)
+{
+	return sys_fchmodat2(dfd, filename, mode, 0);
+}
+
 SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
 {
 	return sys_fchmodat(AT_FDCWD, filename, mode);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 91a740f6b884..982089d55b31 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -775,6 +775,8 @@  asmlinkage long sys_futimesat(int dfd, const char __user *filename,
 asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode);
 asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
 			     umode_t mode);
+asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
+			      umode_t mode, int flag);
 asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
 			     gid_t group, int flag);
 asmlinkage long sys_openat(int dfd, const char __user *filename, int flags,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 9b1462e38b82..e8b0a00908b1 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -730,9 +730,11 @@  __SYSCALL(__NR_pkey_mprotect, sys_pkey_mprotect)
 __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 #define __NR_pkey_free 290
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
+#define __NR_fchmodat2 291
+__SYSCALL(__NR_fchmodat2,     sys_fchmodat2)
 
 #undef __NR_syscalls
-#define __NR_syscalls 291
+#define __NR_syscalls 292
 
 /*
  * All syscalls below here should go away really,
diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
index 2c9082ba6137..2e7471a1d308 100755
--- a/scripts/checksyscalls.sh
+++ b/scripts/checksyscalls.sh
@@ -19,7 +19,7 @@  cat << EOF
 #define __IGNORE_link		/* linkat */
 #define __IGNORE_unlink		/* unlinkat */
 #define __IGNORE_mknod		/* mknodat */
-#define __IGNORE_chmod		/* fchmodat */
+#define __IGNORE_chmod		/* fchmodat2 */
 #define __IGNORE_chown		/* fchownat */
 #define __IGNORE_mkdir		/* mkdirat */
 #define __IGNORE_rmdir		/* unlinkat */
@@ -39,6 +39,7 @@  cat << EOF
 
 /* Missing flags argument */
 #define __IGNORE_renameat	/* renameat2 */
+#define __IGNORE_fchmodat	/* fchmodat2 */
 
 /* CLOEXEC flag */
 #define __IGNORE_pipe		/* pipe2 */