Message ID | 20180929103453.12025-1-cyphar@cyphar.com (mailing list archive) |
---|---|
Headers | show |
Series | namei: implement various scoping AT_* flags | expand |
> On Sep 29, 2018, at 3:34 AM, Aleksa Sarai <cyphar@cyphar.com> wrote: > > The need for some sort of control over VFS's path resolution (to avoid > malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > in the original thread, along with a further split of AT_NO_PROCLINKS > which means that each individual property of AT_NO_JUMPS is now a > separate flag: > > * Path-based escapes from the starting-point using "/" or ".." are > blocked by AT_BENEATH. Seems useful. > * Mountpoint crossings are blocked by AT_XDEV. Seems useful. > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > correctly it actually blocks any user of nd_jump_link() because it > allows out-of-VFS path resolution manipulation). > So how do I disable following symlinks? ISTM the most natural way would be to have AT_NO_SYMLINKS, and to have that flag disable proc links.
On Sat, Sep 29, 2018 at 08:34:50PM +1000, Aleksa Sarai wrote: > The need for some sort of control over VFS's path resolution (to avoid > malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > in the original thread, along with a further split of AT_NO_PROCLINKS > which means that each individual property of AT_NO_JUMPS is now a > separate flag: > > * Path-based escapes from the starting-point using "/" or ".." are > blocked by AT_BENEATH. > * Mountpoint crossings are blocked by AT_XDEV. > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > correctly it actually blocks any user of nd_jump_link() because it > allows out-of-VFS path resolution manipulation). > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > Linus' suggestion in the original thread, I've also implemented > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > "proclink" resolution). > > An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > blocks this as well (feel free to ask me to remove it if you feel this > is not sane). Imho, these flags are very much needed and they all are pretty useful not just for container runtimes but in general. > > Currently I've only enabled these for openat(2) and the stat(2) family. > I would hope we could enable it for basically every *at(2) syscall -- > but many of them appear to not have a @flags argument and thus we'll > need to add several new syscalls to do this. I'm more than happy to send > those patches, but I'd prefer to know that this preliminary work is > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > syscalls. We should really make sure that we can't make due with openat() alone before adding a bunch of new syscalls. So there's no need to rush into this. :) > > One additional feature I've implemented is AT_THIS_ROOT (I imagine this > is probably going to be more contentious than the refresh of > AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > describes my reasoning, but the shortened version of the premise is that > continer runtimes need to have a way to resolve paths within a > potentially malicious rootfs. Container runtimes currently do this in > userspace[2] which has implicit race conditions that are not resolvable > in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for > path resolution, which would be invaluable for us -- and the > implementation is basically identical to AT_BENEATH (except that we > don't return errors when someone actually hits the root). > > I've added some selftests for this, but it's not clear to me whether > they should live here or in xfstests (as far as I can tell there are no > other VFS tests in selftests, while there are some tests that look like > generic VFS tests in xfstests). If you'd prefer them to be included in > xfstests, let me know. > > [1]: https://lore.kernel.org/patchwork/patch/784221/ > [2]: https://github.com/cyphar/filepath-securejoin > > Aleksa Sarai (3): > namei: implement O_BENEATH-style AT_* flags > namei: implement AT_THIS_ROOT chroot-like path resolution > selftests: vfs: add AT_* path resolution tests > > fs/fcntl.c | 2 +- > fs/namei.c | 158 ++++++++++++------ > fs/open.c | 10 ++ > fs/stat.c | 15 +- > include/linux/fcntl.h | 3 +- > include/linux/namei.h | 8 + > include/uapi/asm-generic/fcntl.h | 20 +++ > include/uapi/linux/fcntl.h | 10 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/vfs/.gitignore | 1 + > tools/testing/selftests/vfs/Makefile | 13 ++ > tools/testing/selftests/vfs/at_flags.h | 40 +++++ > tools/testing/selftests/vfs/common.sh | 37 ++++ > .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > 19 files changed, 707 insertions(+), 56 deletions(-) > create mode 100644 tools/testing/selftests/vfs/.gitignore > create mode 100644 tools/testing/selftests/vfs/Makefile > create mode 100644 tools/testing/selftests/vfs/at_flags.h > create mode 100644 tools/testing/selftests/vfs/common.sh > create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh > create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh > create mode 100644 tools/testing/selftests/vfs/vfs_helper.c > > -- > 2.19.0 >
On 2018-09-29, Andy Lutomirski <luto@amacapital.net> wrote: > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > > in the original thread, along with a further split of AT_NO_PROCLINKS > > which means that each individual property of AT_NO_JUMPS is now a > > separate flag: > > > > * Path-based escapes from the starting-point using "/" or ".." are > > blocked by AT_BENEATH. > > Seems useful. > > > * Mountpoint crossings are blocked by AT_XDEV. > > Seems useful. > > > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > > correctly it actually blocks any user of nd_jump_link() because it > > allows out-of-VFS path resolution manipulation). > > > > So how do I disable following symlinks? ISTM the most natural way > would be to have AT_NO_SYMLINKS, and to have that flag disable proc > links. So, this patchset has both AT_NO_SYMLINKS and AT_NO_PROCLINKS. * AT_NO_SYMLINKS blocks *all* symlinks (which is something Linus requested in the original thread[2] -- apparently this is something that would be useful to git even if wouldn't violate AT_BENEATH). This implies AT_NO_PROCLINKS. * AT_NO_PROCLINKS only blocks procfs-style "symlinks" (filesystem "symlinks" that call nd_jump_link() themselves -- currently only procfs and nsfs). The reason why we need AT_NO_PROCLINKS is that "proclinks"[*] allow for breaking-out of nd->root without a trivial way of detecting it (since the filesystem can manipulate nd->path almost arbitrarily outside of the control of VFS). Al Viro's original patchset[1] also blocked these but it was all included within AT_NO_JUMPS. Requiring you to block *all* symlinks in order to block "proclinks" seems to be a bit overkill to me (especially if consider that AT_THIS_ROOT|AT_NO_PROCLINKS is definitely a usecase most container runtimes would be _very_ interested in -- while AT_NO_SYMLINKS will cause issues with most distribution images). [*]: Sorry for the awful naming, I'm not sure what the correct name is (I've called them "super symlinks" in the past) -- if you have a better name please let me know! [1]: https://lwn.net/Articles/721443/ [2]: https://marc.info/?l=linux-kernel&m=149394765324531&w=2
> On Sep 29, 2018, at 8:45 AM, Aleksa Sarai <cyphar@cyphar.com> wrote: > > On 2018-09-29, Andy Lutomirski <luto@amacapital.net> wrote: >>> The most obvious change is that AT_NO_JUMPS has been split as dicussed >>> in the original thread, along with a further split of AT_NO_PROCLINKS >>> which means that each individual property of AT_NO_JUMPS is now a >>> separate flag: >>> >>> * Path-based escapes from the starting-point using "/" or ".." are >>> blocked by AT_BENEATH. >> >> Seems useful. >> >>> * Mountpoint crossings are blocked by AT_XDEV. >> >> Seems useful. >> >>> * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more >>> correctly it actually blocks any user of nd_jump_link() because it >>> allows out-of-VFS path resolution manipulation). >>> >> >> So how do I disable following symlinks? ISTM the most natural way >> would be to have AT_NO_SYMLINKS, and to have that flag disable proc >> links. > > So, this patchset has both AT_NO_SYMLINKS and AT_NO_PROCLINKS. And AT_THIS_ROOT, which is neat. Want to update your cover letter to include all of this? Or at I just reading the wrong thing? > > * AT_NO_SYMLINKS blocks *all* symlinks (which is something Linus requested > in the original thread[2] -- apparently this is something that would > be useful to git even if wouldn't violate AT_BENEATH). This implies > AT_NO_PROCLINKS. > > * AT_NO_PROCLINKS only blocks procfs-style "symlinks" (filesystem > "symlinks" that call nd_jump_link() themselves -- currently only > procfs and nsfs). > Hmm. I’m not sure that blocking nsfs links is always what the container runtime wants, but the overall concept sounds quite useful. Maybe call it AT_NO_TELEPORT? Or AT_NO_MAGIC_LINKS? Also, as a perhaps-silly suggestion: if you end up adding a new syscall, I can see a use for a mode that does the path walk but, rather than failing on a disallowed link, stops early and indicates where it stopped. Then web servers, samba, etc can more efficiently implement custom behavior when links are encountered. And it may also be useful to have a variant of AT_THIS_ROOT where trying to escape is an error instead of having it just get stuck at the root.
On Sat, Sep 29, 2018 at 09:34:24AM -0700, Andy Lutomirski wrote: > Also, as a perhaps-silly suggestion: if you end up adding a new > syscall, I can see a use for a mode that does the path walk but, rather > than failing on a disallowed link, stops early and indicates where it > stopped. Then web servers, samba, etc can more efficiently implement > custom behavior when links are encountered. And it may also be useful > to have a variant of AT_THIS_ROOT where trying to escape is an error > instead of having it just get stuck at the root. AT_USER_LINKS indicating that userspace wants to resolve symlinks themselves?
On 2018-09-29, Christian Brauner <christian@brauner.io> wrote: > > Currently I've only enabled these for openat(2) and the stat(2) family. > > I would hope we could enable it for basically every *at(2) syscall -- > > but many of them appear to not have a @flags argument and thus we'll > > need to add several new syscalls to do this. I'm more than happy to send > > those patches, but I'd prefer to know that this preliminary work is > > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > > syscalls. > > We should really make sure that we can't make due with openat() alone > before adding a bunch of new syscalls. So there's no need to rush into > this. :) Yeah, I think that we could (mostly) make do with openat(2). We might need to have renameat(2) and a few others, but if we had more support for AT_EMPTY_PATH you should be able to just O_PATH|O_{BENEATH,XDEV,...} and then operate on the O_PATH fd.
On Sat, Sep 29, 2018 at 12:35 PM Aleksa Sarai <cyphar@cyphar.com> wrote: > > The need for some sort of control over VFS's path resolution (to avoid > malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > in the original thread, along with a further split of AT_NO_PROCLINKS > which means that each individual property of AT_NO_JUMPS is now a > separate flag: > > * Path-based escapes from the starting-point using "/" or ".." are > blocked by AT_BENEATH. > * Mountpoint crossings are blocked by AT_XDEV. > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > correctly it actually blocks any user of nd_jump_link() because it > allows out-of-VFS path resolution manipulation). > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > Linus' suggestion in the original thread, I've also implemented > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > "proclink" resolution). It seems quite useful to me. > An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > blocks this as well (feel free to ask me to remove it if you feel this > is not sane). > > Currently I've only enabled these for openat(2) and the stat(2) family. > I would hope we could enable it for basically every *at(2) syscall -- > but many of them appear to not have a @flags argument and thus we'll > need to add several new syscalls to do this. I'm more than happy to send > those patches, but I'd prefer to know that this preliminary work is > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > syscalls. What do you think of an equivalent feature AT_NO_SYMLINKS flag for mount()? I guess that would have made the fix for CVE-2017-1002101 in Kubernetes easier to write: https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/ > One additional feature I've implemented is AT_THIS_ROOT (I imagine this > is probably going to be more contentious than the refresh of > AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > describes my reasoning, but the shortened version of the premise is that > continer runtimes need to have a way to resolve paths within a > potentially malicious rootfs. Container runtimes currently do this in > userspace[2] which has implicit race conditions that are not resolvable > in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for > path resolution, which would be invaluable for us -- and the > implementation is basically identical to AT_BENEATH (except that we > don't return errors when someone actually hits the root). > > I've added some selftests for this, but it's not clear to me whether > they should live here or in xfstests (as far as I can tell there are no > other VFS tests in selftests, while there are some tests that look like > generic VFS tests in xfstests). If you'd prefer them to be included in > xfstests, let me know. > > [1]: https://lore.kernel.org/patchwork/patch/784221/ > [2]: https://github.com/cyphar/filepath-securejoin > > Aleksa Sarai (3): > namei: implement O_BENEATH-style AT_* flags > namei: implement AT_THIS_ROOT chroot-like path resolution > selftests: vfs: add AT_* path resolution tests > > fs/fcntl.c | 2 +- > fs/namei.c | 158 ++++++++++++------ > fs/open.c | 10 ++ > fs/stat.c | 15 +- > include/linux/fcntl.h | 3 +- > include/linux/namei.h | 8 + > include/uapi/asm-generic/fcntl.h | 20 +++ > include/uapi/linux/fcntl.h | 10 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/vfs/.gitignore | 1 + > tools/testing/selftests/vfs/Makefile | 13 ++ > tools/testing/selftests/vfs/at_flags.h | 40 +++++ > tools/testing/selftests/vfs/common.sh | 37 ++++ > .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > 19 files changed, 707 insertions(+), 56 deletions(-) > create mode 100644 tools/testing/selftests/vfs/.gitignore > create mode 100644 tools/testing/selftests/vfs/Makefile > create mode 100644 tools/testing/selftests/vfs/at_flags.h > create mode 100644 tools/testing/selftests/vfs/common.sh > create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh > create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh > create mode 100644 tools/testing/selftests/vfs/vfs_helper.c > > -- > 2.19.0
On September 30, 2018 3:54:31 PM GMT+02:00, Alban Crequy <alban@kinvolk.io> wrote: >On Sat, Sep 29, 2018 at 12:35 PM Aleksa Sarai <cyphar@cyphar.com> >wrote: >> >> The need for some sort of control over VFS's path resolution (to >avoid >> malicious paths resulting in inadvertent breakouts) has been a very >> long-standing desire of many userspace applications. This patchset is >a >> revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few >additions. >> >> The most obvious change is that AT_NO_JUMPS has been split as >dicussed >> in the original thread, along with a further split of AT_NO_PROCLINKS >> which means that each individual property of AT_NO_JUMPS is now a >> separate flag: >> >> * Path-based escapes from the starting-point using "/" or ".." are >> blocked by AT_BENEATH. >> * Mountpoint crossings are blocked by AT_XDEV. >> * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more >> correctly it actually blocks any user of nd_jump_link() >because it >> allows out-of-VFS path resolution manipulation). >> >> AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). >At >> Linus' suggestion in the original thread, I've also implemented >> AT_NO_SYMLINKS which just denies _all_ symlink resolution (including >> "proclink" resolution). > >It seems quite useful to me. > >> An additional improvement was made to AT_XDEV. The original >AT_NO_JUMPS >> path didn't consider "/tmp/.." as a mountpoint crossing -- this patch >> blocks this as well (feel free to ask me to remove it if you feel >this >> is not sane). >> >> Currently I've only enabled these for openat(2) and the stat(2) >family. >> I would hope we could enable it for basically every *at(2) syscall -- >> but many of them appear to not have a @flags argument and thus we'll >> need to add several new syscalls to do this. I'm more than happy to >send >> those patches, but I'd prefer to know that this preliminary work is >> acceptable before doing a bunch of copy-paste to add new sets of >*at(2) >> syscalls. > >What do you think of an equivalent feature AT_NO_SYMLINKS flag for >mount()? That's something we discussed but that would need to be part of the new mount API work by David. The current mount API doesn't take AT_* flags since it doesn't operate on fds and we're (sort of) out of mount flags. > >I guess that would have made the fix for CVE-2017-1002101 in >Kubernetes easier to write: >https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/ > >> One additional feature I've implemented is AT_THIS_ROOT (I imagine >this >> is probably going to be more contentious than the refresh of >> AT_NO_JUMPS, so I've included it in a separate patch). The patch >itself >> describes my reasoning, but the shortened version of the premise is >that >> continer runtimes need to have a way to resolve paths within a >> potentially malicious rootfs. Container runtimes currently do this in >> userspace[2] which has implicit race conditions that are not >resolvable >> in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is >> inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics >for >> path resolution, which would be invaluable for us -- and the >> implementation is basically identical to AT_BENEATH (except that we >> don't return errors when someone actually hits the root). >> >> I've added some selftests for this, but it's not clear to me whether >> they should live here or in xfstests (as far as I can tell there are >no >> other VFS tests in selftests, while there are some tests that look >like >> generic VFS tests in xfstests). If you'd prefer them to be included >in >> xfstests, let me know. >> >> [1]: https://lore.kernel.org/patchwork/patch/784221/ >> [2]: https://github.com/cyphar/filepath-securejoin >> >> Aleksa Sarai (3): >> namei: implement O_BENEATH-style AT_* flags >> namei: implement AT_THIS_ROOT chroot-like path resolution >> selftests: vfs: add AT_* path resolution tests >> >> fs/fcntl.c | 2 +- >> fs/namei.c | 158 >++++++++++++------ >> fs/open.c | 10 ++ >> fs/stat.c | 15 +- >> include/linux/fcntl.h | 3 +- >> include/linux/namei.h | 8 + >> include/uapi/asm-generic/fcntl.h | 20 +++ >> include/uapi/linux/fcntl.h | 10 ++ >> tools/testing/selftests/Makefile | 1 + >> tools/testing/selftests/vfs/.gitignore | 1 + >> tools/testing/selftests/vfs/Makefile | 13 ++ >> tools/testing/selftests/vfs/at_flags.h | 40 +++++ >> tools/testing/selftests/vfs/common.sh | 37 ++++ >> .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ >> .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ >> .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ >> .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ >> .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ >> tools/testing/selftests/vfs/vfs_helper.c | 154 >+++++++++++++++++ >> 19 files changed, 707 insertions(+), 56 deletions(-) >> create mode 100644 tools/testing/selftests/vfs/.gitignore >> create mode 100644 tools/testing/selftests/vfs/Makefile >> create mode 100644 tools/testing/selftests/vfs/at_flags.h >> create mode 100644 tools/testing/selftests/vfs/common.sh >> create mode 100755 >tools/testing/selftests/vfs/tests/0001_at_beneath.sh >> create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh >> create mode 100755 >tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh >> create mode 100755 >tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh >> create mode 100755 >tools/testing/selftests/vfs/tests/0005_at_this_root.sh >> create mode 100644 tools/testing/selftests/vfs/vfs_helper.c >> >> -- >> 2.19.0
As a side note, I'm still working on Landlock which can achieve the same goal but in a more flexible and dynamic way: https://landlock.io Regards, Mickaël On 9/29/18 12:34, Aleksa Sarai wrote: > The need for some sort of control over VFS's path resolution (to avoid > malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > in the original thread, along with a further split of AT_NO_PROCLINKS > which means that each individual property of AT_NO_JUMPS is now a > separate flag: > > * Path-based escapes from the starting-point using "/" or ".." are > blocked by AT_BENEATH. > * Mountpoint crossings are blocked by AT_XDEV. > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > correctly it actually blocks any user of nd_jump_link() because it > allows out-of-VFS path resolution manipulation). > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > Linus' suggestion in the original thread, I've also implemented > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > "proclink" resolution). > > An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > blocks this as well (feel free to ask me to remove it if you feel this > is not sane). > > Currently I've only enabled these for openat(2) and the stat(2) family. > I would hope we could enable it for basically every *at(2) syscall -- > but many of them appear to not have a @flags argument and thus we'll > need to add several new syscalls to do this. I'm more than happy to send > those patches, but I'd prefer to know that this preliminary work is > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > syscalls. > > One additional feature I've implemented is AT_THIS_ROOT (I imagine this > is probably going to be more contentious than the refresh of > AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > describes my reasoning, but the shortened version of the premise is that > continer runtimes need to have a way to resolve paths within a > potentially malicious rootfs. Container runtimes currently do this in > userspace[2] which has implicit race conditions that are not resolvable > in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for > path resolution, which would be invaluable for us -- and the > implementation is basically identical to AT_BENEATH (except that we > don't return errors when someone actually hits the root). > > I've added some selftests for this, but it's not clear to me whether > they should live here or in xfstests (as far as I can tell there are no > other VFS tests in selftests, while there are some tests that look like > generic VFS tests in xfstests). If you'd prefer them to be included in > xfstests, let me know. > > [1]: https://lore.kernel.org/patchwork/patch/784221/ > [2]: https://github.com/cyphar/filepath-securejoin > > Aleksa Sarai (3): > namei: implement O_BENEATH-style AT_* flags > namei: implement AT_THIS_ROOT chroot-like path resolution > selftests: vfs: add AT_* path resolution tests > > fs/fcntl.c | 2 +- > fs/namei.c | 158 ++++++++++++------ > fs/open.c | 10 ++ > fs/stat.c | 15 +- > include/linux/fcntl.h | 3 +- > include/linux/namei.h | 8 + > include/uapi/asm-generic/fcntl.h | 20 +++ > include/uapi/linux/fcntl.h | 10 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/vfs/.gitignore | 1 + > tools/testing/selftests/vfs/Makefile | 13 ++ > tools/testing/selftests/vfs/at_flags.h | 40 +++++ > tools/testing/selftests/vfs/common.sh | 37 ++++ > .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > 19 files changed, 707 insertions(+), 56 deletions(-) > create mode 100644 tools/testing/selftests/vfs/.gitignore > create mode 100644 tools/testing/selftests/vfs/Makefile > create mode 100644 tools/testing/selftests/vfs/at_flags.h > create mode 100644 tools/testing/selftests/vfs/common.sh > create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh > create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh > create mode 100644 tools/testing/selftests/vfs/vfs_helper.c >
On Sun, Sep 30, 2018 at 10:39 PM Mickaël Salaün <mic@digikod.net> wrote: > As a side note, I'm still working on Landlock which can achieve the same > goal but in a more flexible and dynamic way: https://landlock.io Isn't Landlock mostly intended for userspace that wants to impose a custom Mandatory Access Control policy on itself, restricting the whole process? As far as I can tell, a major usecase for AT_BENEATH are privileged processes that do not want to restrict all filesystem operations they perform, but want to sometimes impose limits on filesystem traversal for the duration of a single system call. For example, a process might want to first open a file from an untrusted filesystem area with AT_BENEATH, and afterwards open a configuration file without AT_BENEATH. How would you do this in Landlock? Use a BPF map to store per-thread filesystem restrictions, and then do bpf() calls before and after every restricted filesystem access to set and unset the policy for the current syscall? > On 9/29/18 12:34, Aleksa Sarai wrote: > > The need for some sort of control over VFS's path resolution (to avoid > > malicious paths resulting in inadvertent breakouts) has been a very > > long-standing desire of many userspace applications. This patchset is a > > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > > in the original thread, along with a further split of AT_NO_PROCLINKS > > which means that each individual property of AT_NO_JUMPS is now a > > separate flag: > > > > * Path-based escapes from the starting-point using "/" or ".." are > > blocked by AT_BENEATH. > > * Mountpoint crossings are blocked by AT_XDEV. > > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > > correctly it actually blocks any user of nd_jump_link() because it > > allows out-of-VFS path resolution manipulation). > > > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > > Linus' suggestion in the original thread, I've also implemented > > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > > "proclink" resolution). > > > > An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > > path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > > blocks this as well (feel free to ask me to remove it if you feel this > > is not sane). > > > > Currently I've only enabled these for openat(2) and the stat(2) family. > > I would hope we could enable it for basically every *at(2) syscall -- > > but many of them appear to not have a @flags argument and thus we'll > > need to add several new syscalls to do this. I'm more than happy to send > > those patches, but I'd prefer to know that this preliminary work is > > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > > syscalls. > > > > One additional feature I've implemented is AT_THIS_ROOT (I imagine this > > is probably going to be more contentious than the refresh of > > AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > > describes my reasoning, but the shortened version of the premise is that > > continer runtimes need to have a way to resolve paths within a > > potentially malicious rootfs. Container runtimes currently do this in > > userspace[2] which has implicit race conditions that are not resolvable > > in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > > inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for > > path resolution, which would be invaluable for us -- and the > > implementation is basically identical to AT_BENEATH (except that we > > don't return errors when someone actually hits the root). > > > > I've added some selftests for this, but it's not clear to me whether > > they should live here or in xfstests (as far as I can tell there are no > > other VFS tests in selftests, while there are some tests that look like > > generic VFS tests in xfstests). If you'd prefer them to be included in > > xfstests, let me know. > > > > [1]: https://lore.kernel.org/patchwork/patch/784221/ > > [2]: https://github.com/cyphar/filepath-securejoin > > > > Aleksa Sarai (3): > > namei: implement O_BENEATH-style AT_* flags > > namei: implement AT_THIS_ROOT chroot-like path resolution > > selftests: vfs: add AT_* path resolution tests > > > > fs/fcntl.c | 2 +- > > fs/namei.c | 158 ++++++++++++------ > > fs/open.c | 10 ++ > > fs/stat.c | 15 +- > > include/linux/fcntl.h | 3 +- > > include/linux/namei.h | 8 + > > include/uapi/asm-generic/fcntl.h | 20 +++ > > include/uapi/linux/fcntl.h | 10 ++ > > tools/testing/selftests/Makefile | 1 + > > tools/testing/selftests/vfs/.gitignore | 1 + > > tools/testing/selftests/vfs/Makefile | 13 ++ > > tools/testing/selftests/vfs/at_flags.h | 40 +++++ > > tools/testing/selftests/vfs/common.sh | 37 ++++ > > .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > > .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > > .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > > .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > > .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > > tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > > 19 files changed, 707 insertions(+), 56 deletions(-) > > create mode 100644 tools/testing/selftests/vfs/.gitignore > > create mode 100644 tools/testing/selftests/vfs/Makefile > > create mode 100644 tools/testing/selftests/vfs/at_flags.h > > create mode 100644 tools/testing/selftests/vfs/common.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh > > create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh > > create mode 100644 tools/testing/selftests/vfs/vfs_helper.c > > >
On 9/30/18 23:46, Jann Horn wrote: > On Sun, Sep 30, 2018 at 10:39 PM Mickaël Salaün <mic@digikod.net> wrote: >> As a side note, I'm still working on Landlock which can achieve the same >> goal but in a more flexible and dynamic way: https://landlock.io > > Isn't Landlock mostly intended for userspace that wants to impose a > custom Mandatory Access Control policy on itself, restricting the > whole process? > > As far as I can tell, a major usecase for AT_BENEATH are privileged > processes that do not want to restrict all filesystem operations they > perform, but want to sometimes impose limits on filesystem traversal > for the duration of a single system call. For example, a process might > want to first open a file from an untrusted filesystem area with > AT_BENEATH, and afterwards open a configuration file without > AT_BENEATH. I didn't realized this was the main use case for AT_BENEATH. Landlock is indeed dedicated to apply a security policy on a set of processes. This set can be a process and its children (seccomp-like), or another set of processes that may be identified with a cgroup. > > How would you do this in Landlock? Use a BPF map to store per-thread > filesystem restrictions, and then do bpf() calls before and after > every restricted filesystem access to set and unset the policy for the > current syscall? Another way to apply a security policy could be to tied it to a file descriptor, similarly to Capsicum, which could enable to create programmable (real) capabilities. This way, it would be possible to "wrap" a file descriptor with a Landlock program and use it with FD-based syscalls or pass it to other processes. This would not require changes to the FS subsystem, but only the Landlock LSM code. This isn't done yet but I plan to add this new way to restrict operations on file descriptors. Anyway, for the use case you mentioned, the AT_BENEATH flag(s) should be simple to use and enough for now. We must be careful of the hardcoded policy though. > >> On 9/29/18 12:34, Aleksa Sarai wrote: >>> The need for some sort of control over VFS's path resolution (to avoid >>> malicious paths resulting in inadvertent breakouts) has been a very >>> long-standing desire of many userspace applications. This patchset is a >>> revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. >>> >>> The most obvious change is that AT_NO_JUMPS has been split as dicussed >>> in the original thread, along with a further split of AT_NO_PROCLINKS >>> which means that each individual property of AT_NO_JUMPS is now a >>> separate flag: >>> >>> * Path-based escapes from the starting-point using "/" or ".." are >>> blocked by AT_BENEATH. >>> * Mountpoint crossings are blocked by AT_XDEV. >>> * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more >>> correctly it actually blocks any user of nd_jump_link() because it >>> allows out-of-VFS path resolution manipulation). >>> >>> AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At >>> Linus' suggestion in the original thread, I've also implemented >>> AT_NO_SYMLINKS which just denies _all_ symlink resolution (including >>> "proclink" resolution). >>> >>> An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS >>> path didn't consider "/tmp/.." as a mountpoint crossing -- this patch >>> blocks this as well (feel free to ask me to remove it if you feel this >>> is not sane). >>> >>> Currently I've only enabled these for openat(2) and the stat(2) family. >>> I would hope we could enable it for basically every *at(2) syscall -- >>> but many of them appear to not have a @flags argument and thus we'll >>> need to add several new syscalls to do this. I'm more than happy to send >>> those patches, but I'd prefer to know that this preliminary work is >>> acceptable before doing a bunch of copy-paste to add new sets of *at(2) >>> syscalls. >>> >>> One additional feature I've implemented is AT_THIS_ROOT (I imagine this >>> is probably going to be more contentious than the refresh of >>> AT_NO_JUMPS, so I've included it in a separate patch). The patch itself >>> describes my reasoning, but the shortened version of the premise is that >>> continer runtimes need to have a way to resolve paths within a >>> potentially malicious rootfs. Container runtimes currently do this in >>> userspace[2] which has implicit race conditions that are not resolvable >>> in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is >>> inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for >>> path resolution, which would be invaluable for us -- and the >>> implementation is basically identical to AT_BENEATH (except that we >>> don't return errors when someone actually hits the root). >>> >>> I've added some selftests for this, but it's not clear to me whether >>> they should live here or in xfstests (as far as I can tell there are no >>> other VFS tests in selftests, while there are some tests that look like >>> generic VFS tests in xfstests). If you'd prefer them to be included in >>> xfstests, let me know. >>> >>> [1]: https://lore.kernel.org/patchwork/patch/784221/ >>> [2]: https://github.com/cyphar/filepath-securejoin >>> >>> Aleksa Sarai (3): >>> namei: implement O_BENEATH-style AT_* flags >>> namei: implement AT_THIS_ROOT chroot-like path resolution >>> selftests: vfs: add AT_* path resolution tests >>> >>> fs/fcntl.c | 2 +- >>> fs/namei.c | 158 ++++++++++++------ >>> fs/open.c | 10 ++ >>> fs/stat.c | 15 +- >>> include/linux/fcntl.h | 3 +- >>> include/linux/namei.h | 8 + >>> include/uapi/asm-generic/fcntl.h | 20 +++ >>> include/uapi/linux/fcntl.h | 10 ++ >>> tools/testing/selftests/Makefile | 1 + >>> tools/testing/selftests/vfs/.gitignore | 1 + >>> tools/testing/selftests/vfs/Makefile | 13 ++ >>> tools/testing/selftests/vfs/at_flags.h | 40 +++++ >>> tools/testing/selftests/vfs/common.sh | 37 ++++ >>> .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ >>> .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ >>> .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ >>> .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ >>> .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ >>> tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ >>> 19 files changed, 707 insertions(+), 56 deletions(-) >>> create mode 100644 tools/testing/selftests/vfs/.gitignore >>> create mode 100644 tools/testing/selftests/vfs/Makefile >>> create mode 100644 tools/testing/selftests/vfs/at_flags.h >>> create mode 100644 tools/testing/selftests/vfs/common.sh >>> create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh >>> create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh >>> create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh >>> create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh >>> create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh >>> create mode 100644 tools/testing/selftests/vfs/vfs_helper.c >>> >> > >
On Sat, Sep 29, 2018 at 08:34:50PM +1000, Aleksa Sarai wrote: > I've added some selftests for this, but it's not clear to me whether > they should live here or in xfstests (as far as I can tell there are no > other VFS tests in selftests, while there are some tests that look like > generic VFS tests in xfstests). If you'd prefer them to be included in > xfstests, let me know. xfstests, please. That way the new functionality will get immediate coverage by all the main filesystem development and distro QA teams.... Cheers, Dave.
On 2018-10-01, Dave Chinner <david@fromorbit.com> wrote: > > I've added some selftests for this, but it's not clear to me whether > > they should live here or in xfstests (as far as I can tell there are no > > other VFS tests in selftests, while there are some tests that look like > > generic VFS tests in xfstests). If you'd prefer them to be included in > > xfstests, let me know. > > xfstests, please. That way the new functionality will get immediate > coverage by all the main filesystem development and distro QA > teams.... Sure, will do. Do you want me to submit them in parallel -- and what is the correct ML to submit changes to xfstests? Sorry for the silly questions. :P
On Mon, Oct 01, 2018 at 03:47:23PM +1000, Aleksa Sarai wrote: > On 2018-10-01, Dave Chinner <david@fromorbit.com> wrote: > > > I've added some selftests for this, but it's not clear to me whether > > > they should live here or in xfstests (as far as I can tell there are no > > > other VFS tests in selftests, while there are some tests that look like > > > generic VFS tests in xfstests). If you'd prefer them to be included in > > > xfstests, let me know. > > > > xfstests, please. That way the new functionality will get immediate > > coverage by all the main filesystem development and distro QA > > teams.... > > Sure, will do. Do you want me to submit them in parallel -- That's usually the way we do things, but we don't tend to commit the fstests changes until the thing it is testing has landed upstream. > and what is > the correct ML to submit changes to xfstests? fstests@vger.kernel.org > Sorry for the silly questions. :P You're going to have many more of them when you start moving the tests across to fstests :P Cheers, Dave.
From: Aleksa Sarai > Sent: 29 September 2018 11:35 > > The need for some sort of control over VFS's path resolution (to avoid > malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > in the original thread, along with a further split of AT_NO_PROCLINKS > which means that each individual property of AT_NO_JUMPS is now a > separate flag: > > * Path-based escapes from the starting-point using "/" or ".." are > blocked by AT_BENEATH. You may need to allow absolute paths that refer to items inside the controlled area. (Even if done by a textual replacement based on the expected name of the base directory.) > * Mountpoint crossings are blocked by AT_XDEV. You might want a mountpoint flag that allows crossing into the mounted filesystem (you may need to get out in order to do pwd()). > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > correctly it actually blocks any user of nd_jump_link() because it > allows out-of-VFS path resolution manipulation). Or 'fix' the /proc/$pid/fd/$fd code to open the actual vnode rather than being a symlink (although this might still let you get a directory vnode). FWIW this is what NetBSD does - you can link the open file back into the filesystem! > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > Linus' suggestion in the original thread, I've also implemented > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > "proclink" resolution). What about allowing 'trivial' symlinks? ... > Currently I've only enabled these for openat(2) and the stat(2) family. > I would hope we could enable it for basically every *at(2) syscall -- > but many of them appear to not have a @flags argument and thus we'll > need to add several new syscalls to do this. I'm more than happy to send > those patches, but I'd prefer to know that this preliminary work is > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > syscalls. If you make the flags a property of the directory vnode (perhaps as well as any syscall flags), and make it inherited by vnode lookup then it can be used to stop library functions (or entire binaries) using blocked paths. You'd then only need to add an fcntl() call to set the flags (but never clear them) to get the restriction applied to every lookup. ... David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 2018-10-01, David Laight <David.Laight@ACULAB.COM> wrote: > > The need for some sort of control over VFS's path resolution (to avoid > > malicious paths resulting in inadvertent breakouts) has been a very > > long-standing desire of many userspace applications. This patchset is a > > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > > in the original thread, along with a further split of AT_NO_PROCLINKS > > which means that each individual property of AT_NO_JUMPS is now a > > separate flag: > > > > * Path-based escapes from the starting-point using "/" or ".." are > > blocked by AT_BENEATH. > > You may need to allow absolute paths that refer to items inside > the controlled area. > (Even if done by a textual replacement based on the expected name > of the base directory.) This is sort of what AT_THIS_ROOT does. I didn't want to include it for AT_BENEATH because it would be just as contentious as AT_THIS_ROOT currently is. :P > > * Mountpoint crossings are blocked by AT_XDEV. > > You might want a mountpoint flag that allows crossing into the mounted > filesystem (you may need to get out in order to do pwd()). Like a mount flag? I'm not sure how I feel about that. The intention is to allow for a process to have control over how path lookups are handled, and tying it to a mount flag means that it's no longer entirely up to the process. > > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > > correctly it actually blocks any user of nd_jump_link() because it > > allows out-of-VFS path resolution manipulation). > > Or 'fix' the /proc/$pid/fd/$fd code to open the actual vnode rather than > being a symlink (although this might still let you get a directory vnode). > FWIW this is what NetBSD does - you can link the open file back into > the filesystem! Isn't this how it works currently? The /proc/$pid/fd/$fd "symlinks" are actually references to the underlying file (they can even escape a pivot_root()) -- you can re-open them or do any number of other dodgy things through /proc with them (we definitely abuse this in container runtimes -- and I'm sure plenty of other people do as well). > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > > Linus' suggestion in the original thread, I've also implemented > > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > > "proclink" resolution). > > What about allowing 'trivial' symlinks? The use-case of AT_NO_SYMLINKS that Linus pitched[1] is that git wants to have a unique name for every object and so allowing trivial symlinks is a no-go. I assume "trivial" here means "no-'..' components"? > > Currently I've only enabled these for openat(2) and the stat(2) family. > > I would hope we could enable it for basically every *at(2) syscall -- > > but many of them appear to not have a @flags argument and thus we'll > > need to add several new syscalls to do this. I'm more than happy to send > > those patches, but I'd prefer to know that this preliminary work is > > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > > syscalls. > > If you make the flags a property of the directory vnode (perhaps as > well as any syscall flags), and make it inherited by vnode lookup then > it can be used to stop library functions (or entire binaries) using > blocked paths. > You'd then only need to add an fcntl() call to set the flags (but never > clear them) to get the restriction applied to every lookup. This seems like it might be useful, but it could always be done as a follow-up patch by just setting LOOKUP_BLAH if the dirfd has the flag set. I'm also a little bit concerned that (because fd flags are set on the 'struct file') if you start sharing fds then you can no longer use the lookup scoping for security (a racing process could remove the flags while the management process resolves through it).
On Mon, 1 Oct 2018, Mickaël Salaün wrote: > Another way to apply a security policy could be to tied it to a file > descriptor, similarly to Capsicum, which could enable to create > programmable (real) capabilities. This way, it would be possible to > "wrap" a file descriptor with a Landlock program and use it with > FD-based syscalls or pass it to other processes. This would not require > changes to the FS subsystem, but only the Landlock LSM code. This isn't > done yet but I plan to add this new way to restrict operations on file > descriptors. Very interesting! This could possibly be an LSM which stacks/integrates with other LSMs to enforce MAC of object capabilities. > > Anyway, for the use case you mentioned, the AT_BENEATH flag(s) should be > simple to use and enough for now. We must be careful of the hardcoded > policy though. > > > > > >> On 9/29/18 12:34, Aleksa Sarai wrote: > >>> The need for some sort of control over VFS's path resolution (to avoid > >>> malicious paths resulting in inadvertent breakouts) has been a very > >>> long-standing desire of many userspace applications. This patchset is a > >>> revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > >>> > >>> The most obvious change is that AT_NO_JUMPS has been split as dicussed > >>> in the original thread, along with a further split of AT_NO_PROCLINKS > >>> which means that each individual property of AT_NO_JUMPS is now a > >>> separate flag: > >>> > >>> * Path-based escapes from the starting-point using "/" or ".." are > >>> blocked by AT_BENEATH. > >>> * Mountpoint crossings are blocked by AT_XDEV. > >>> * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > >>> correctly it actually blocks any user of nd_jump_link() because it > >>> allows out-of-VFS path resolution manipulation). > >>> > >>> AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > >>> Linus' suggestion in the original thread, I've also implemented > >>> AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > >>> "proclink" resolution). > >>> > >>> An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > >>> path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > >>> blocks this as well (feel free to ask me to remove it if you feel this > >>> is not sane). > >>> > >>> Currently I've only enabled these for openat(2) and the stat(2) family. > >>> I would hope we could enable it for basically every *at(2) syscall -- > >>> but many of them appear to not have a @flags argument and thus we'll > >>> need to add several new syscalls to do this. I'm more than happy to send > >>> those patches, but I'd prefer to know that this preliminary work is > >>> acceptable before doing a bunch of copy-paste to add new sets of *at(2) > >>> syscalls. > >>> > >>> One additional feature I've implemented is AT_THIS_ROOT (I imagine this > >>> is probably going to be more contentious than the refresh of > >>> AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > >>> describes my reasoning, but the shortened version of the premise is that > >>> continer runtimes need to have a way to resolve paths within a > >>> potentially malicious rootfs. Container runtimes currently do this in > >>> userspace[2] which has implicit race conditions that are not resolvable > >>> in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > >>> inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for > >>> path resolution, which would be invaluable for us -- and the > >>> implementation is basically identical to AT_BENEATH (except that we > >>> don't return errors when someone actually hits the root). > >>> > >>> I've added some selftests for this, but it's not clear to me whether > >>> they should live here or in xfstests (as far as I can tell there are no > >>> other VFS tests in selftests, while there are some tests that look like > >>> generic VFS tests in xfstests). If you'd prefer them to be included in > >>> xfstests, let me know. > >>> > >>> [1]: https://lore.kernel.org/patchwork/patch/784221/ > >>> [2]: https://github.com/cyphar/filepath-securejoin > >>> > >>> Aleksa Sarai (3): > >>> namei: implement O_BENEATH-style AT_* flags > >>> namei: implement AT_THIS_ROOT chroot-like path resolution > >>> selftests: vfs: add AT_* path resolution tests > >>> > >>> fs/fcntl.c | 2 +- > >>> fs/namei.c | 158 ++++++++++++------ > >>> fs/open.c | 10 ++ > >>> fs/stat.c | 15 +- > >>> include/linux/fcntl.h | 3 +- > >>> include/linux/namei.h | 8 + > >>> include/uapi/asm-generic/fcntl.h | 20 +++ > >>> include/uapi/linux/fcntl.h | 10 ++ > >>> tools/testing/selftests/Makefile | 1 + > >>> tools/testing/selftests/vfs/.gitignore | 1 + > >>> tools/testing/selftests/vfs/Makefile | 13 ++ > >>> tools/testing/selftests/vfs/at_flags.h | 40 +++++ > >>> tools/testing/selftests/vfs/common.sh | 37 ++++ > >>> .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > >>> .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > >>> .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > >>> .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > >>> .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > >>> tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > >>> 19 files changed, 707 insertions(+), 56 deletions(-) > >>> create mode 100644 tools/testing/selftests/vfs/.gitignore > >>> create mode 100644 tools/testing/selftests/vfs/Makefile > >>> create mode 100644 tools/testing/selftests/vfs/at_flags.h > >>> create mode 100644 tools/testing/selftests/vfs/common.sh > >>> create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh > >>> create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > >>> create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh > >>> create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh > >>> create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh > >>> create mode 100644 tools/testing/selftests/vfs/vfs_helper.c > >>> > >> > > > > > >
From: Aleksa Sarai > Sent: 01 October 2018 17:16 > > On 2018-10-01, David Laight <David.Laight@ACULAB.COM> wrote: ... > > > * Mountpoint crossings are blocked by AT_XDEV. > > > > You might want a mountpoint flag that allows crossing into the mounted > > filesystem (you may need to get out in order to do pwd()). > > Like a mount flag? I'm not sure how I feel about that. The intention is > to allow for a process to have control over how path lookups are > handled, and tying it to a mount flag means that it's no longer entirely > up to the process. Right, but you may have some mount points that you don't want to cross and others that it is perfectly fine to cross. For example you might want to be able to cross into a 'tmp' filesystem. ... > > If you make the flags a property of the directory vnode (perhaps as > > well as any syscall flags), and make it inherited by vnode lookup then > > it can be used to stop library functions (or entire binaries) using > > blocked paths. > > You'd then only need to add an fcntl() call to set the flags (but never > > clear them) to get the restriction applied to every lookup. > > This seems like it might be useful, but it could always be done as a > follow-up patch by just setting LOOKUP_BLAH if the dirfd has the flag > set. I'm also a little bit concerned that (because fd flags are set on > the 'struct file') if you start sharing fds then you can no longer use > the lookup scoping for security (a racing process could remove the > flags while the management process resolves through it). I was thinking that the flags would never be removable. A management process might have to flip its cwd back and forth in order to clear the flags (opendir(".") should give a different struct file). This all gets tied up with the slight requirement for per-thread cwd. I had another thought that the crudentials structure used for a file lookup could also be taken from the cwd (not sure how it would get there - especially if you need the correct group list). That would allow a 'management' process to open a file in the context of the target user process. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)