mbox series

[v2,0/6] Landlock: ioctl support

Message ID 20230623144329.136541-1-gnoack@google.com (mailing list archive)
Headers show
Series Landlock: ioctl support | expand

Message

Günther Noack June 23, 2023, 2:43 p.m. UTC
Hello!

These patches add simple ioctl(2) support to Landlock.

Objective
~~~~~~~~~

Make ioctl(2) requests restrictable with Landlock,
in a way that is useful for real-world applications.

Proposed approach
~~~~~~~~~~~~~~~~~

Introduce the LANDLOCK_ACCESS_FS_IOCTL right, which restricts the use
of ioctl(2) on file descriptors.

We attach the LANDLOCK_ACCESS_FS_IOCTL right to opened file
descriptors, as we already do for LANDLOCK_ACCESS_FS_TRUNCATE.

I believe that this approach works for the majority of use cases, and
offers a good trade-off between Landlock API and implementation
complexity and flexibility when the feature is used.

Current limitations
~~~~~~~~~~~~~~~~~~~

With this patch set, ioctl(2) requests can *not* be filtered based on
file type, device number (dev_t) or on the ioctl(2) request number.

On the initial RFC patch set [1], we have reached consensus to start
with this simpler coarse-grained approach, and build additional ioctl
restriction capabilities on top in subsequent steps.

[1] https://lore.kernel.org/linux-security-module/d4f1395c-d2d4-1860-3a02-2a0c023dd761@digikod.net/

Notable implications of this approach
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Existing inherited file descriptors stay unaffected
  when a program enables Landlock.

  This means in particular that in common scenarios,
  the terminal's ioctls (ioctl_tty(2)) continue to work.

* ioctl(2) continues to be available for file descriptors acquired
  through means other than open(2).  Example: Network sockets,
  memfd_create(2), file descriptors that are already open before the
  Landlock ruleset is enabled.

Examples
~~~~~~~~

Starting a sandboxed shell from $HOME with samples/landlock/sandboxer:

  LL_FS_RO=/ LL_FS_RW=. ./sandboxer /bin/bash

The LANDLOCK_ACCESS_FS_IOCTL right is part of the "read-write" rights
here, so we expect that newly opened files outside of $HOME don't work
with ioctl(2).

  * "stty" works: It probes terminal properties

  * "stty </dev/tty" fails: /dev/tty can be reopened, but the ioctl is
    denied.

  * "eject" fails: ioctls to use CD-ROM drive are denied.

  * "ls /dev" works: It uses ioctl to get the terminal size for
    columnar layout

  * The text editors "vim" and "mg" work.  (GNU Emacs fails because it
    attempts to reopen /dev/tty.)

Related Work
~~~~~~~~~~~~

OpenBSD's pledge(2) [2] restricts ioctl(2) independent of the file
descriptor which is used.  The implementers maintain multiple
allow-lists of predefined ioctl(2) operations required for different
application domains such as "audio", "bpf", "tty" and "inet".

OpenBSD does not guarantee ABI backwards compatibility to the same
extent as Linux does, so it's easier for them to update these lists in
later versions.  It might not be a feasible approach for Linux though.

[2] https://man.openbsd.org/OpenBSD-7.3/pledge.2

Changes
~~~~~~~

V2:
 * rebased on mic-next
 * added documentation
 * exercise ioctl in the memfd test
 * test: Use layout0 for the test

---

V1: https://lore.kernel.org/linux-security-module/20230502171755.9788-1-gnoack3000@gmail.com/

Günther Noack (6):
  landlock: Increment Landlock ABI version to 4
  landlock: Add LANDLOCK_ACCESS_FS_IOCTL access right
  selftests/landlock: Test ioctl support
  selftests/landlock: Test ioctl with memfds
  samples/landlock: Add support for LANDLOCK_ACCESS_FS_IOCTL
  landlock: Document ioctl support

 Documentation/userspace-api/landlock.rst     | 52 ++++++++-----
 include/uapi/linux/landlock.h                | 19 +++--
 samples/landlock/sandboxer.c                 | 12 ++-
 security/landlock/fs.c                       | 21 +++++-
 security/landlock/limits.h                   |  2 +-
 security/landlock/syscalls.c                 |  2 +-
 tools/testing/selftests/landlock/base_test.c |  2 +-
 tools/testing/selftests/landlock/fs_test.c   | 77 ++++++++++++++++++--
 8 files changed, 149 insertions(+), 38 deletions(-)


base-commit: 35ca4239929737bdc021ee923f97ebe7aff8fcc4

Comments

Günther Noack June 23, 2023, 3:20 p.m. UTC | #1
Hello!

On Fri, Jun 23, 2023 at 04:43:23PM +0200, Günther Noack wrote:
> Proposed approach
> ~~~~~~~~~~~~~~~~~
> 
> Introduce the LANDLOCK_ACCESS_FS_IOCTL right, which restricts the use
> of ioctl(2) on file descriptors.
> 
> We attach the LANDLOCK_ACCESS_FS_IOCTL right to opened file
> descriptors, as we already do for LANDLOCK_ACCESS_FS_TRUNCATE.
> 
> I believe that this approach works for the majority of use cases, and
> offers a good trade-off between Landlock API and implementation
> complexity and flexibility when the feature is used.
> 
> Current limitations
> ~~~~~~~~~~~~~~~~~~~
> 
> With this patch set, ioctl(2) requests can *not* be filtered based on
> file type, device number (dev_t) or on the ioctl(2) request number.
> 
> On the initial RFC patch set [1], we have reached consensus to start
> with this simpler coarse-grained approach, and build additional ioctl
> restriction capabilities on top in subsequent steps.

We *could* use this opportunity to blanket disable the TIOCSTI ioctl, if a
Landlock policy gets enabled and ioctls are handled.

TIOCSTI is a TTY ioctl which is commonly used as a sandbox escape, if the
sandboxes system can get a hold on a TTY file descriptor from outside the
sandbox (like STDOUT, hah).

An excellent summary of these problems was already written by Kees Cook on the
Linux patch which removes TIOCSTI depending on a kernel config option:
https://lore.kernel.org/lkml/20221022182828.give.717-kees@kernel.org/

Unfortunately on the distributions I have tested so far (Debian and Arch Linux),
TIOCSTI is still enabled.

***Proposal***:

  I am in favor of *disabling* TIOCSTI in all Landlocked processes,
  if the Landlock policy handles the LANDLOCK_ACCESS_FS_IOCTL right.

If we want to do it in a backwards-compatible way, now would be the time to add
it to the patch set. :)

As far as I can tell, there are no good legitimate use cases for TIOCSTI, and it
would fix the aforementioned sandbox escaping trick for a Landlocked process.
With the patch set as it is now, the only way to prevent that sandbox escape is
unfortunately to either (1) close the TTY file descriptors when enabling
Landlock, or (2) rely on an outside process to pass something else than a TTY
for FDs 0, 1 and 2.

Does that sound reasonable?  If yes, I'd send an update to this patch set which
forbids TIOCSTI.

Thanks,
—Günther
Mickaël Salaün June 23, 2023, 4:37 p.m. UTC | #2
Hi, thanks for the patches!


On 23/06/2023 17:20, Günther Noack wrote:
> Hello!
> 
> On Fri, Jun 23, 2023 at 04:43:23PM +0200, Günther Noack wrote:
>> Proposed approach
>> ~~~~~~~~~~~~~~~~~
>>
>> Introduce the LANDLOCK_ACCESS_FS_IOCTL right, which restricts the use
>> of ioctl(2) on file descriptors.
>>
>> We attach the LANDLOCK_ACCESS_FS_IOCTL right to opened file
>> descriptors, as we already do for LANDLOCK_ACCESS_FS_TRUNCATE.
>>
>> I believe that this approach works for the majority of use cases, and
>> offers a good trade-off between Landlock API and implementation
>> complexity and flexibility when the feature is used.
>>
>> Current limitations
>> ~~~~~~~~~~~~~~~~~~~
>>
>> With this patch set, ioctl(2) requests can *not* be filtered based on
>> file type, device number (dev_t) or on the ioctl(2) request number.
>>
>> On the initial RFC patch set [1], we have reached consensus to start
>> with this simpler coarse-grained approach, and build additional ioctl
>> restriction capabilities on top in subsequent steps.
> 
> We *could* use this opportunity to blanket disable the TIOCSTI ioctl, if a
> Landlock policy gets enabled and ioctls are handled.
> 
> TIOCSTI is a TTY ioctl which is commonly used as a sandbox escape, if the
> sandboxes system can get a hold on a TTY file descriptor from outside the
> sandbox (like STDOUT, hah).
> 
> An excellent summary of these problems was already written by Kees Cook on the
> Linux patch which removes TIOCSTI depending on a kernel config option:
> https://lore.kernel.org/lkml/20221022182828.give.717-kees@kernel.org/
> 
> Unfortunately on the distributions I have tested so far (Debian and Arch Linux),
> TIOCSTI is still enabled.
> 
> ***Proposal***:
> 
>    I am in favor of *disabling* TIOCSTI in all Landlocked processes,
>    if the Landlock policy handles the LANDLOCK_ACCESS_FS_IOCTL right.
> 
> If we want to do it in a backwards-compatible way, now would be the time to add
> it to the patch set. :)

What would that not be backward compatible?

> 
> As far as I can tell, there are no good legitimate use cases for TIOCSTI, and it
> would fix the aforementioned sandbox escaping trick for a Landlocked process.
> With the patch set as it is now, the only way to prevent that sandbox escape is
> unfortunately to either (1) close the TTY file descriptors when enabling
> Landlock, or (2) rely on an outside process to pass something else than a TTY
> for FDs 0, 1 and 2.

What about calling setsid()?

Alternatively, seccomp could be used, even if it could block overlapping 
IOCTLs as well…


> 
> Does that sound reasonable?  If yes, I'd send an update to this patch set which
> forbids TIOCSTI.

I agree that TIOCSTI is dangerous, but I don't see the rationale to add 
an exception for this specific IOCTL. I'm sure there are a lot of 
potentially dangerous IOCTLs out there, but from a kernel point of view, 
why should Landlock handle this one in a specific way?

Landlock should not define specific policies itself but let users manage 
that. Landlock should only restrict kernel features that *directly* 
enable to bypass its own restrictions (e.g. ptrace scope, FS topology 
changes when FS restrictions are in place).

I think we should instead focus on adding something like a 
landlock_inode_attr rule type to restrict IOCTLs defined by 
users/developers, and then extend it to make it possible to restrict 
already opened FDs as well.

> 
> Thanks,
> —Günther
>
Günther Noack June 26, 2023, 6:13 p.m. UTC | #3
Hello!

On Fri, Jun 23, 2023 at 06:37:25PM +0200, Mickaël Salaün wrote:
> On 23/06/2023 17:20, Günther Noack wrote:
> > ***Proposal***:
> > 
> >    I am in favor of *disabling* TIOCSTI in all Landlocked processes,
> >    if the Landlock policy handles the LANDLOCK_ACCESS_FS_IOCTL right.
> > 
> > If we want to do it in a backwards-compatible way, now would be the time to add
> > it to the patch set. :)
> 
> What would that not be backward compatible?

What I meant is that disabling TIOCSTI for Landlocked processes would
only be backwards compatible for Landlock users if they did a
conscious step for opting in to that feature, such as specifying that
"ioctl" should be a handled right.


> > As far as I can tell, there are no good legitimate use cases for TIOCSTI, and it
> > would fix the aforementioned sandbox escaping trick for a Landlocked process.
> > With the patch set as it is now, the only way to prevent that sandbox escape is
> > unfortunately to either (1) close the TTY file descriptors when enabling
> > Landlock, or (2) rely on an outside process to pass something else than a TTY
> > for FDs 0, 1 and 2.
> 
> What about calling setsid()?
> 
> Alternatively, seccomp could be used, even if it could block overlapping
> IOCTLs as well…

The possible approaches that I have seen kicked around are:

setsid()

  This does not work reliably from all processes, unfortunately.
  In particular, it does not work in the common case where a process
  gets started from a shell, without pipes or other bells and
  whistles.
  
  From the man page:

    setsid() creates a new session if the calling process is not a
    process group leader.

  Shells run subprocesses in their own process groups and if it is
  only one, it is its own process group leader (see credentials(7)).

  Such a process *could* of course run a subprocess and do it there,
  but it would potentially require architectural changes in some
  programs to do that, which de

seccomp-bpf

  That's a fallback solution, yes, although I am still hoping in the
  long run that we can get away without it.  The problem of tracking
  the available syscall numbers as I previously discussed it at [1]
  still exists: If the compiled program runs on a new kernel with a
  new syscall number, and is linked against a new version of glibc,
  that program might start doing this syscall but it can't tell apart
  in the seccomp filter whether this is to be blocked or not.

  It's a fundamentally flawed approach when linking against shared
  objects and running on unknown (future) Linux versions.

  [1] https://blog.gnoack.org/post/pledge-on-linux/

creating a new pseudo-terminal

  The cleaner solution suggested by Alan Cox in [2] is to create a new
  pseudo terminal and run the program within that pseudo terminal.

  This is also the technique used by programs like su (with the --pty
  flag) and sudo.  The man pages of both programs talk about it.

  Implementing this approach unfortunately also requires some
  architectural changes to the program doing that - it would also
  involve two processes again, one which keeps a reference to the tty
  which shovels data between the old and new terminal, and one which
  is sandboxed which only sees the pseudo-terminal.  The details are
  described in "Advanced Programming in the UNIX Environment", Chapter
  11.

  [2] https://lore.kernel.org/all/20170510212920.7f6bc5e6@alans-desktop/


I dislike all three of them for the Landlock use case:

 - Involving additional processes is a bigger change that the programs
   using Landlock would have to deal with.

 - Seccomp-BPF is a hack as well, due to the problem of having to
   track syscall numbers across platforms.

It might not be worth doing these things just to bridge the gap until
we have a more proper solution in Landlock.

For more entertaining/scary background reading, this search query in
the CVE database has links where you can see how the issue has been
dealt with in the past:

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=TIOCSTI

The most recent one of these is in opendoas, the discussion in that
thread also has some interesting pointers (and the team ended up stuck
with the same three potential fixes which have similar problems for
them and which they have not implemented so far):
https://github.com/Duncaen/OpenDoas/issues/106



> > Does that sound reasonable?  If yes, I'd send an update to this patch set which
> > forbids TIOCSTI.
> 
> I agree that TIOCSTI is dangerous, but I don't see the rationale to add an
> exception for this specific IOCTL. I'm sure there are a lot of potentially
> dangerous IOCTLs out there, but from a kernel point of view, why should
> Landlock handle this one in a specific way?
> 
> Landlock should not define specific policies itself but let users manage
> that. Landlock should only restrict kernel features that *directly* enable
> to bypass its own restrictions (e.g. ptrace scope, FS topology changes when
> FS restrictions are in place).
> 
> I think we should instead focus on adding something like a
> landlock_inode_attr rule type to restrict IOCTLs defined by
> users/developers, and then extend it to make it possible to restrict already
> opened FDs as well.

After researching all the stuff above, I believe you are right - it's
better to leave this up to userspace and let them define the ioctls
that they actually need with an allow-listing approach, to reduce the
risk from obscure TTY ioctls.

Another point that also came up in the mail thread in [2] above was:
TIOCSTI is not the only way to do harm through the tty FD.  Another
one is TIOCLINUX (ioctl_console(2)) through its cut&paste
functionality, and who knows what other ioctls there might be.

So in that sense, I'm coming around to your approach of letting the
user define it in the next iteration of this ioctl patch set -- but
it will really be necessary to do that. o_O

–Günther
Mickaël Salaün July 12, 2023, 10:55 a.m. UTC | #4
Hi,

Thinking more about this, this first step is too restrictive, which
might lead to dangerous situations.

My main concern is that this approach will deny innocuous or even "good"
IOCTL. For instance, if FIOCLEX is denied, this could leak file
descriptors and then introduce vulnerabilities.

As we discussed before, we cannot categorize all IOCTLs, but I think we
need to add exceptions for a subset of them, and maintain this list.
SELinux has some special handling within selinux_file_ioctl(), and we
should use that as a starting point. The do_vfs_ioctl() function is
another important place to look at. The main thing to keep in mind is
that Landlock's goal is to restrict access to data (e.g. FS, network,
IPC, bypass through other processes), not to restrict innocuous (at
least in theory) kernel features.

I think, at least all IOCTLs targeting file descriptors themselves 
should always be allowed, similar to fcntl(2)'s F_SETFD and F_SETFL 
commands:
- FIOCLEX
- FIONCLEX
- FIONBIO
- FIOASYNC

Some others may not be a good idea:
- FIONREAD should be OK in theory but the VFS part only target regular
files and there is no access check according to the read right, which is
weird.
- FICLONE, FICLONERANGE, FIDEDUPRANGE: read/write actions.

We should add a built-time or run-time safeguard to be sure that future
FD IOCTLs will be added to this list. I'm not sure how to efficiently
implement such protection though.


I'm also wondering if we should not split the IOCTL access right into
three: mostly read, mostly write, and misc. _IOC_READ and _IOC_WRITE are
definitely not perfect, but tied to specific drivers (i.e. not a file
hierarchy but a block or character device) this might help until we get
a more fine-grained IOCTL access control. We should check if it's worth
it according to commonly used drivers. Looking at the TTY driver, most
IOCTLs are without read or write markers. Using this split could induce
a false sense of security, so it should be well motivated.



On 23/06/2023 16:43, Günther Noack wrote:
> Hello!
> 
> These patches add simple ioctl(2) support to Landlock.
> 
> Objective ~~~~~~~~~
> 
> Make ioctl(2) requests restrictable with Landlock, in a way that is 
> useful for real-world applications.
> 
> Proposed approach ~~~~~~~~~~~~~~~~~
> 
> Introduce the LANDLOCK_ACCESS_FS_IOCTL right, which restricts the
> use of ioctl(2) on file descriptors.
> 
> We attach the LANDLOCK_ACCESS_FS_IOCTL right to opened file 
> descriptors, as we already do for LANDLOCK_ACCESS_FS_TRUNCATE.
> 
> I believe that this approach works for the majority of use cases,
> and offers a good trade-off between Landlock API and implementation 
> complexity and flexibility when the feature is used.
> 
> Current limitations ~~~~~~~~~~~~~~~~~~~
> 
> With this patch set, ioctl(2) requests can *not* be filtered based
> on file type, device number (dev_t) or on the ioctl(2) request
> number.
> 
> On the initial RFC patch set [1], we have reached consensus to start 
> with this simpler coarse-grained approach, and build additional ioctl
> restriction capabilities on top in subsequent steps.
> 
> [1] 
> https://lore.kernel.org/linux-security-module/d4f1395c-d2d4-1860-3a02-2a0c023dd761@digikod.net/
>
>
>
>
>
> 
Notable implications of this approach
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> * Existing inherited file descriptors stay unaffected when a program
>  enables Landlock.
> 
> This means in particular that in common scenarios, the terminal's 
> ioctls (ioctl_tty(2)) continue to work.
> 
> * ioctl(2) continues to be available for file descriptors acquired 
> through means other than open(2).  Example: Network sockets, 
> memfd_create(2), file descriptors that are already open before the 
> Landlock ruleset is enabled.
> 
> Examples ~~~~~~~~
> 
> Starting a sandboxed shell from $HOME with 
> samples/landlock/sandboxer:
> 
> LL_FS_RO=/ LL_FS_RW=. ./sandboxer /bin/bash
> 
> The LANDLOCK_ACCESS_FS_IOCTL right is part of the "read-write"
> rights here, so we expect that newly opened files outside of $HOME
> don't work with ioctl(2).
> 
> * "stty" works: It probes terminal properties
> 
> * "stty </dev/tty" fails: /dev/tty can be reopened, but the ioctl is 
> denied.
> 
> * "eject" fails: ioctls to use CD-ROM drive are denied.
> 
> * "ls /dev" works: It uses ioctl to get the terminal size for 
> columnar layout
> 
> * The text editors "vim" and "mg" work.  (GNU Emacs fails because it 
> attempts to reopen /dev/tty.)
> 
> Related Work ~~~~~~~~~~~~
> 
> OpenBSD's pledge(2) [2] restricts ioctl(2) independent of the file 
> descriptor which is used.  The implementers maintain multiple 
> allow-lists of predefined ioctl(2) operations required for different 
> application domains such as "audio", "bpf", "tty" and "inet".
> 
> OpenBSD does not guarantee ABI backwards compatibility to the same 
> extent as Linux does, so it's easier for them to update these lists 
> in later versions.  It might not be a feasible approach for Linux 
> though.
> 
> [2] https://man.openbsd.org/OpenBSD-7.3/pledge.2
> 
> Changes ~~~~~~~
> 
> V2: * rebased on mic-next * added documentation * exercise ioctl in 
> the memfd test * test: Use layout0 for the test
> 
> ---
> 
> V1: 
> https://lore.kernel.org/linux-security-module/20230502171755.9788-1-gnoack3000@gmail.com/
>
>
>
>
>
> 
Günther Noack (6): landlock: Increment Landlock ABI version to 4
> landlock: Add LANDLOCK_ACCESS_FS_IOCTL access right 
> selftests/landlock: Test ioctl support selftests/landlock: Test ioctl
> with memfds samples/landlock: Add support for 
> LANDLOCK_ACCESS_FS_IOCTL landlock: Document ioctl support
> 
> Documentation/userspace-api/landlock.rst     | 52 ++++++++----- 
> include/uapi/linux/landlock.h                | 19 +++-- 
> samples/landlock/sandboxer.c                 | 12 ++- 
> security/landlock/fs.c                       | 21 +++++- 
> security/landlock/limits.h                   |  2 +- 
> security/landlock/syscalls.c                 |  2 +- 
> tools/testing/selftests/landlock/base_test.c |  2 +- 
> tools/testing/selftests/landlock/fs_test.c   | 77 
> ++++++++++++++++++-- 8 files changed, 149 insertions(+), 38 
> deletions(-)
> 
> 
> base-commit: 35ca4239929737bdc021ee923f97ebe7aff8fcc4
Günther Noack July 12, 2023, 2:56 p.m. UTC | #5
Hi!

On Wed, Jul 12, 2023 at 12:55:19PM +0200, Mickaël Salaün wrote:
> Thinking more about this, this first step is too restrictive, which
> might lead to dangerous situations.
> 
> My main concern is that this approach will deny innocuous or even "good"
> IOCTL. For instance, if FIOCLEX is denied, this could leak file
> descriptors and then introduce vulnerabilities.

This is a good point.

> As we discussed before, we cannot categorize all IOCTLs, but I think we
> need to add exceptions for a subset of them, and maintain this list.
> SELinux has some special handling within selinux_file_ioctl(), and we
> should use that as a starting point. The do_vfs_ioctl() function is
> another important place to look at. The main thing to keep in mind is
> that Landlock's goal is to restrict access to data (e.g. FS, network,
> IPC, bypass through other processes), not to restrict innocuous (at
> least in theory) kernel features.
> 
> I think, at least all IOCTLs targeting file descriptors themselves should
> always be allowed, similar to fcntl(2)'s F_SETFD and F_SETFL commands:
> - FIOCLEX
> - FIONCLEX
> - FIONBIO
> - FIOASYNC
> 
> Some others may not be a good idea:
> - FIONREAD should be OK in theory but the VFS part only target regular
> files and there is no access check according to the read right, which is
> weird.
> - FICLONE, FICLONERANGE, FIDEDUPRANGE: read/write actions.
> 
> We should add a built-time or run-time safeguard to be sure that future
> FD IOCTLs will be added to this list. I'm not sure how to efficiently
> implement such protection though.

I need to ponder it a bit.. :)  I also don't see an obvious solution yet how to
tie these lists of ioctls together.


> I'm also wondering if we should not split the IOCTL access right into
> three: mostly read, mostly write, and misc. _IOC_READ and _IOC_WRITE are
> definitely not perfect, but tied to specific drivers (i.e. not a file
> hierarchy but a block or character device) this might help until we get
> a more fine-grained IOCTL access control. We should check if it's worth
> it according to commonly used drivers. Looking at the TTY driver, most
> IOCTLs are without read or write markers. Using this split could induce
> a false sense of security, so it should be well motivated.

As it was pointed out by the LWN article that Jeff Xu pointed to [1], this
read/write bit in the ioctl command number is only referring to whether the
*argument pointer* to the ioctl is being read or written, but you can not use
this bit to infer whether the ioctl itself performs a "reading" or "writing"
access to the underlying file.

As the LWN article explains, SELinux has fallen for the same trap in the past,
the post [2] has an example for an ioctl where the read/write bits for the
argument are not related to what the underlying operation does.

It might be that you could potentially use the _IOC_READ and _IOC_WRITE bits to
group smaller subsets of the ioctl cmd space, such as for a single device type.
But then, the users would have to go through the list of supported ioctls one by
one anyway, to ensure that this works for that subset.  If they are going
through them one by one anyway, they might maybe just as well list them out in
the filter rule...?

[1] https://lwn.net/Articles/428140
[2] https://lwn.net/Articles/428142/


I've also pondered more about the ioctl support. I have a work-in-progress patch
set which filters ioctls according to various criteria, but it's not fully
fleshed out yet.

In the big picture: I think that the main ways how we can build this differently
are (a) the criteria on which to decide whether an ioctl is permitted, and (b)
the time at which we evaluate these criteria (time of open() vs. time of
ioctl()).  We can also evaluate the criteria at different times, depending on
which criterium it is.

So:

(a) The criteria on which to decide that an ioctl is permitted:

    We have discussed the followowing ones so far:

    * The *ioctl cmd* (request) itself
       - needs to be taken into account, obviously.
       - ioctl cmds do not have an obvious ordering exposed to userspace,
         so asking users to specify ranges is potentially difficult
       - asking users to list all individual ioctls they do might result in
         lists that are larger than I had thought. I've straced Firefox and
         found that it did about 20-30 direct-rendering related ioctls, and most
         of them were specific to my graphics card... o_O so I assume that there
         are more of these for other graphics cards.

    * The *file device ID* (major / minor)
       - specifying ranges is a good idea - ranges of device IDs are logically
         grouped and the order is also exposed and documented to user space.

    * The *file type*, read from filp->f_mode
       - includes regular files, directories, char devices, block devices,
         fifos and sockets
       - BUT this list of types in non-exhaustive:
         - there are valid struct file objects which have special types and are
           not distinguishable. They might not have a file type set in f_mode,
           even.  Examples include pidfds, or the Landlock ruleset FD. -- so: we
           do need a way to ignore file type altogether in an ioctl rule, so
           that such "special" file types can still be matched in the rule.

    * The *file path*
       - This can only really be checked against at open() time, imho.
         Doing it during the ioctl is too late, because the file might
         have been created in a different mount namespace, and then the
         current thread can't really make that file work with ioctls.
       - Not all open files *have* a file path (i.e. sockets, Landlock ruleset)

(b) The time at which the criteria are checked:

    * During open():
       - A check at this time is necessary to match against file paths, imho,
         as we already to in the ioctl patch set I've sent.

    * During ioctl():
       - A check at this time is *also* necessary, because without that, we will
         not be able to restrict ioctls on TTYs and other file descriptors that
         are obtained from other processes.

The tentative approach I've taken in my own patch set and the WIP part so far is:

 (1) Do file path checks at open() time (modeled as a access_fs right)
 (2) Do cmds, device ID and file type checks at ioctl() time.
     This is modeled independently of the file path check. -- both checks need to
     pass independently for an ioctl invocation to be permitted.

The API of that approach is:
 * The ruleset attribute gets a new handled_misc field,
   and when setting the first bit in it, it'll deny all ioctls
   unless there is a special ioctl rule added for them
   (and the path of the file was OK for ioctl at open time).
 * A new rule type with an associated struct landlock_ioctl_attr -
   that struct lets users define:
     - the desired mask of file types (or 0 for "all")
     - the designed device ID range
     - the list of ioctl cmds to be permitted for such files

An open question is whether the ruleset attr's "handled_misc" field should
rather be a "handled_ioctl_cmds" field, a set of restricted ioctl cmds
(potentially [0, 2^32)).  I think that would be more consistent conceptually
with how it was done for file system access rights, but obviously we can't model
it as a bit field any more - it would have to be some other suitable
representation of a set of integers, which also lets people say "all ioctls".

The upside of that approach would be that it could also be used to selectively
restrict specific known-evil ioctls, and letting all others continue to work.
For example, sandboxing or sudo-like programs could filter out TIOCSTI and
TIOCLINUX.

I'd be interested in hearing your opinion about this (also from the Chromium
side).

Thanks,
—Günther
Mickaël Salaün July 12, 2023, 5:48 p.m. UTC | #6
On 12/07/2023 16:56, Günther Noack wrote:
> Hi!
> 
> On Wed, Jul 12, 2023 at 12:55:19PM +0200, Mickaël Salaün wrote:
>> Thinking more about this, this first step is too restrictive, which
>> might lead to dangerous situations.
>>
>> My main concern is that this approach will deny innocuous or even "good"
>> IOCTL. For instance, if FIOCLEX is denied, this could leak file
>> descriptors and then introduce vulnerabilities.
> 
> This is a good point.
> 
>> As we discussed before, we cannot categorize all IOCTLs, but I think we
>> need to add exceptions for a subset of them, and maintain this list.
>> SELinux has some special handling within selinux_file_ioctl(), and we
>> should use that as a starting point. The do_vfs_ioctl() function is
>> another important place to look at. The main thing to keep in mind is
>> that Landlock's goal is to restrict access to data (e.g. FS, network,
>> IPC, bypass through other processes), not to restrict innocuous (at
>> least in theory) kernel features.
>>
>> I think, at least all IOCTLs targeting file descriptors themselves should
>> always be allowed, similar to fcntl(2)'s F_SETFD and F_SETFL commands:
>> - FIOCLEX
>> - FIONCLEX
>> - FIONBIO
>> - FIOASYNC
>>
>> Some others may not be a good idea:
>> - FIONREAD should be OK in theory but the VFS part only target regular
>> files and there is no access check according to the read right, which is
>> weird.
>> - FICLONE, FICLONERANGE, FIDEDUPRANGE: read/write actions.
>>
>> We should add a built-time or run-time safeguard to be sure that future
>> FD IOCTLs will be added to this list. I'm not sure how to efficiently
>> implement such protection though.
> 
> I need to ponder it a bit.. :)  I also don't see an obvious solution yet how to
> tie these lists of ioctls together.

I guess it should be ok to manually watch the do_vfs_ioctl() changes, 
but definitely not optimal.

> 
> 
>> I'm also wondering if we should not split the IOCTL access right into
>> three: mostly read, mostly write, and misc. _IOC_READ and _IOC_WRITE are
>> definitely not perfect, but tied to specific drivers (i.e. not a file
>> hierarchy but a block or character device) this might help until we get
>> a more fine-grained IOCTL access control. We should check if it's worth
>> it according to commonly used drivers. Looking at the TTY driver, most
>> IOCTLs are without read or write markers. Using this split could induce
>> a false sense of security, so it should be well motivated.
> 
> As it was pointed out by the LWN article that Jeff Xu pointed to [1], this
> read/write bit in the ioctl command number is only referring to whether the
> *argument pointer* to the ioctl is being read or written, but you can not use
> this bit to infer whether the ioctl itself performs a "reading" or "writing"
> access to the underlying file.
> 
> As the LWN article explains, SELinux has fallen for the same trap in the past,
> the post [2] has an example for an ioctl where the read/write bits for the
> argument are not related to what the underlying operation does.
> 
> It might be that you could potentially use the _IOC_READ and _IOC_WRITE bits to
> group smaller subsets of the ioctl cmd space, such as for a single device type.
> But then, the users would have to go through the list of supported ioctls one by
> one anyway, to ensure that this works for that subset.  If they are going
> through them one by one anyway, they might maybe just as well list them out in
> the filter rule...?
> 
> [1] https://lwn.net/Articles/428140
> [2] https://lwn.net/Articles/428142/

Right, I fell again in this trap, !_IOC_READ cannot even guarantee 
non-write actions.

A useful split would be at least between devices and regular 
files/directories, something like this:
- LANDLOCK_ACCESS_FS_IOCTL_DEV: allows IOCTLs on character or block 
devices, which should be targeted on specific paths.
- LANDLOCK_ACCESS_FS_IOCTL_NODEV: allows IOCTLs on regular files, 
directories, unix sockets, pipes, and symlinks. These are targeting 
filesystems (e.g. ext4's fsverity) or common Linux file types.

I think it makes sense because the underlying filesystems should already 
check for read/write access, which is not the case for block/char 
devices. Pipe and unix socket IOCTLs are quite specific but don't touch 
the underlying filesystem, and it should be allowed to properly use 
them. It should be noted that the pipe and socket IOCTL implementations 
don't care about their file mode though; I guess the rationale might be 
that IOCTLs may be required to (efficiently) either read or write.

Reading your following comments, this dev/nodev classification would be 
like the *file type* item, but simpler and only for file descriptors 
accessible through the filesystem, which I guess could be everything 
because of procfs…

This split might also help for the landlock_inode_attr properties, but 
it would also be a bit redundant with the file type match…


> 
> 
> I've also pondered more about the ioctl support. I have a work-in-progress patch
> set which filters ioctls according to various criteria, but it's not fully
> fleshed out yet.
> 
> In the big picture: I think that the main ways how we can build this differently
> are (a) the criteria on which to decide whether an ioctl is permitted, and (b)
> the time at which we evaluate these criteria (time of open() vs. time of
> ioctl()).  We can also evaluate the criteria at different times, depending on
> which criterium it is.
> 
> So:
> 
> (a) The criteria on which to decide that an ioctl is permitted:
> 
>      We have discussed the followowing ones so far:
> 
>      * The *ioctl cmd* (request) itself
>         - needs to be taken into account, obviously.
>         - ioctl cmds do not have an obvious ordering exposed to userspace,
>           so asking users to specify ranges is potentially difficult
>         - asking users to list all individual ioctls they do might result in
>           lists that are larger than I had thought. I've straced Firefox and
>           found that it did about 20-30 direct-rendering related ioctls, and most
>           of them were specific to my graphics card... o_O so I assume that there
>           are more of these for other graphics cards.
> 
>      * The *file device ID* (major / minor)
>         - specifying ranges is a good idea - ranges of device IDs are logically
>           grouped and the order is also exposed and documented to user space.
> 
>      * The *file type*, read from filp->f_mode
>         - includes regular files, directories, char devices, block devices,
>           fifos and sockets
>         - BUT this list of types in non-exhaustive:
>           - there are valid struct file objects which have special types and are
>             not distinguishable. They might not have a file type set in f_mode,
>             even.  Examples include pidfds, or the Landlock ruleset FD. -- so: we
>             do need a way to ignore file type altogether in an ioctl rule, so
>             that such "special" file types can still be matched in the rule.
> 
>      * The *file path*
>         - This can only really be checked against at open() time, imho.
>           Doing it during the ioctl is too late, because the file might
>           have been created in a different mount namespace, and then the
>           current thread can't really make that file work with ioctls.
>         - Not all open files *have* a file path (i.e. sockets, Landlock ruleset)

I think we can reach a lot through /proc/self/fd/

> 
> (b) The time at which the criteria are checked:
> 
>      * During open():
>         - A check at this time is necessary to match against file paths, imho,
>           as we already to in the ioctl patch set I've sent.
> 
>      * During ioctl():
>         - A check at this time is *also* necessary, because without that, we will
>           not be able to restrict ioctls on TTYs and other file descriptors that
>           are obtained from other processes.

As I explained before, I don't think we should care about inherited or 
passed FDs. Other ways to get FDs (e.g. landlock_create_ruleset) should 
probably not be a priority for now.


> 
> The tentative approach I've taken in my own patch set and the WIP part so far is:
> 
>   (1) Do file path checks at open() time (modeled as a access_fs right)
>   (2) Do cmds, device ID and file type checks at ioctl() time.
>       This is modeled independently of the file path check. -- both checks need to
>       pass independently for an ioctl invocation to be permitted.

This looks good! However, see below an alternative approach for the 
rules combination.

> 
> The API of that approach is:
>   * The ruleset attribute gets a new handled_misc field,
>     and when setting the first bit in it, it'll deny all ioctls
>     unless there is a special ioctl rule added for them
>     (and the path of the file was OK for ioctl at open time).
>   * A new rule type with an associated struct landlock_ioctl_attr -
>     that struct lets users define:
>       - the desired mask of file types (or 0 for "all")
>       - the designed device ID range
>       - the list of ioctl cmds to be permitted for such files
> 
> An open question is whether the ruleset attr's "handled_misc" field should
> rather be a "handled_ioctl_cmds" field, a set of restricted ioctl cmds
> (potentially [0, 2^32)).  I think that would be more consistent conceptually
> with how it was done for file system access rights, but obviously we can't model
> it as a bit field any more - it would have to be some other suitable
> representation of a set of integers, which also lets people say "all ioctls".

We might not need another ruleset's field because we can reuse the 
existing FS access rights, including the new IOCTL one(s). The trick is 
just to define a new way to match files (and optionally specific IOCTL 
commands), hence the landlock_inode_attr proposal. As you explained, 
this type of rule could match device IDs and file types. An alternative 
way to identify such properties would be to pass an FD and specify a 
subset of these properties to match on. This would avoid some side 
channel issues, and could be used to check for directory or file (as 
done for path_beneath) to avoid irrelevant access rights.

I suggest to first handle path_beneath and inode rules as a binary OR 
set of rights (i.e. the sandboxed processes only needs one rule to match 
for the defined action to be allowed). Then, with a way to identify 
inodes rules, we could treat them as synthetic access rights and add a 
new allowed_inode field to path_beneath rules and remove the IOCTL 
access rights from their allowed_access field. This way, sandboxed 
processes would need both rules to match.


> 
> The upside of that approach would be that it could also be used to selectively
> restrict specific known-evil ioctls, and letting all others continue to work.
> For example, sandboxing or sudo-like programs could filter out TIOCSTI and
> TIOCLINUX.
> 
> I'd be interested in hearing your opinion about this (also from the Chromium
> side).
> 
> Thanks,
> —Günther
>
Günther Noack July 13, 2023, 10:38 p.m. UTC | #7
Hi!

On Wed, Jul 12, 2023 at 07:48:29PM +0200, Mickaël Salaün wrote:
> On 12/07/2023 16:56, Günther Noack wrote:
> > On Wed, Jul 12, 2023 at 12:55:19PM +0200, Mickaël Salaün wrote:
> > > Thinking more about this, this first step is too restrictive, which
> > > might lead to dangerous situations.
> > > 
> > > My main concern is that this approach will deny innocuous or even "good"
> > > IOCTL. For instance, if FIOCLEX is denied, this could leak file
> > > descriptors and then introduce vulnerabilities.
> > 
> > This is a good point.
> > 
> > > As we discussed before, we cannot categorize all IOCTLs, but I think we
> > > need to add exceptions for a subset of them, and maintain this list.
> > > SELinux has some special handling within selinux_file_ioctl(), and we
> > > should use that as a starting point. The do_vfs_ioctl() function is
> > > another important place to look at. The main thing to keep in mind is
> > > that Landlock's goal is to restrict access to data (e.g. FS, network,
> > > IPC, bypass through other processes), not to restrict innocuous (at
> > > least in theory) kernel features.
> > > 
> > > I think, at least all IOCTLs targeting file descriptors themselves should
> > > always be allowed, similar to fcntl(2)'s F_SETFD and F_SETFL commands:
> > > - FIOCLEX
> > > - FIONCLEX
> > > - FIONBIO
> > > - FIOASYNC
> > > 
> > > Some others may not be a good idea:
> > > - FIONREAD should be OK in theory but the VFS part only target regular
> > > files and there is no access check according to the read right, which is
> > > weird.
> > > - FICLONE, FICLONERANGE, FIDEDUPRANGE: read/write actions.
> > > 
> > > We should add a built-time or run-time safeguard to be sure that future
> > > FD IOCTLs will be added to this list. I'm not sure how to efficiently
> > > implement such protection though.
> > 
> > I need to ponder it a bit.. :)  I also don't see an obvious solution yet how to
> > tie these lists of ioctls together.
> 
> I guess it should be ok to manually watch the do_vfs_ioctl() changes, but
> definitely not optimal.
> 
> > 
> > 
> > > I'm also wondering if we should not split the IOCTL access right into
> > > three: mostly read, mostly write, and misc. _IOC_READ and _IOC_WRITE are
> > > definitely not perfect, but tied to specific drivers (i.e. not a file
> > > hierarchy but a block or character device) this might help until we get
> > > a more fine-grained IOCTL access control. We should check if it's worth
> > > it according to commonly used drivers. Looking at the TTY driver, most
> > > IOCTLs are without read or write markers. Using this split could induce
> > > a false sense of security, so it should be well motivated.
> > 
> > As it was pointed out by the LWN article that Jeff Xu pointed to [1], this
> > read/write bit in the ioctl command number is only referring to whether the
> > *argument pointer* to the ioctl is being read or written, but you can not use
> > this bit to infer whether the ioctl itself performs a "reading" or "writing"
> > access to the underlying file.
> > 
> > As the LWN article explains, SELinux has fallen for the same trap in the past,
> > the post [2] has an example for an ioctl where the read/write bits for the
> > argument are not related to what the underlying operation does.
> > 
> > It might be that you could potentially use the _IOC_READ and _IOC_WRITE bits to
> > group smaller subsets of the ioctl cmd space, such as for a single device type.
> > But then, the users would have to go through the list of supported ioctls one by
> > one anyway, to ensure that this works for that subset.  If they are going
> > through them one by one anyway, they might maybe just as well list them out in
> > the filter rule...?
> > 
> > [1] https://lwn.net/Articles/428140
> > [2] https://lwn.net/Articles/428142/
> 
> Right, I fell again in this trap, !_IOC_READ cannot even guarantee non-write
> actions.
> 
> A useful split would be at least between devices and regular
> files/directories, something like this:
> - LANDLOCK_ACCESS_FS_IOCTL_DEV: allows IOCTLs on character or block devices,
> which should be targeted on specific paths.
> - LANDLOCK_ACCESS_FS_IOCTL_NODEV: allows IOCTLs on regular files,
> directories, unix sockets, pipes, and symlinks. These are targeting
> filesystems (e.g. ext4's fsverity) or common Linux file types.

To make sure we are on the same page, let me paraphrase:

You are suggesting that we should split the LANDLOCK_ACCESS_FS_IOCTL
right into a LANDLOCK_ACCESS_FS_IOCTL_DEV part (for block and
character devices) and a LANDLOCK_ACCESS_FS_IOCTL_NODEV part (for
regular files, directories, named(!) unix sockets, named(!) pipes and
symlinks)?  The check would presumably be done during the open(2) call
and then store the access right on the freshly opened struct file?

If Landlock only checks the ioctl criteria during open(2), that would
mean that file descriptors created through other means would be
unaffected.

In particular, the protection would only apply to named pipes and Unix
sockets which get newly opened through the file system, but it would
not apply to pipes created through pipe(2) and Unix sockets created
through socketpair(2)?

(It is more clearly a philosophy of "protecting resources", rather
than a philosophy of limiting access to the thousands of potentially
buggy ioctl implementations. - But I think it might be reasonable to
permit unnamed pipes and socketpairs - they are useful mechanisms and
seem harmless as long as their implementations don't have bugs.)


> I think it makes sense because the underlying filesystems should already
> check for read/write access, which is not the case for block/char devices.
> Pipe and unix socket IOCTLs are quite specific but don't touch the
> underlying filesystem, and it should be allowed to properly use them. It
> should be noted that the pipe and socket IOCTL implementations don't care
> about their file mode though; I guess the rationale might be that IOCTLs may
> be required to (efficiently) either read or write.
> 

I don't understand your remark about the read/write access.

Pipes have a read end and a write end, where only one of the two
operations should work.  Unix sockets are always bidirectional, if I
remember this correctly.

> Reading your following comments, this dev/nodev classification would be like
> the *file type* item, but simpler and only for file descriptors accessible
> through the filesystem, which I guess could be everything because of procfs…
> 
> This split might also help for the landlock_inode_attr properties, but it
> would also be a bit redundant with the file type match…

I agree that this dev/nodev classification seems like a simpler
version of the *file type* item from below.

> 
> 
> > 
> > 
> > I've also pondered more about the ioctl support. I have a work-in-progress patch
> > set which filters ioctls according to various criteria, but it's not fully
> > fleshed out yet.
> > 
> > In the big picture: I think that the main ways how we can build this differently
> > are (a) the criteria on which to decide whether an ioctl is permitted, and (b)
> > the time at which we evaluate these criteria (time of open() vs. time of
> > ioctl()).  We can also evaluate the criteria at different times, depending on
> > which criterium it is.
> > 
> > So:
> > 
> > (a) The criteria on which to decide that an ioctl is permitted:
> > 
> >      We have discussed the followowing ones so far:
> > 
> >      * The *ioctl cmd* (request) itself
> >         - needs to be taken into account, obviously.
> >         - ioctl cmds do not have an obvious ordering exposed to userspace,
> >           so asking users to specify ranges is potentially difficult
> >         - asking users to list all individual ioctls they do might result in
> >           lists that are larger than I had thought. I've straced Firefox and
> >           found that it did about 20-30 direct-rendering related ioctls, and most
> >           of them were specific to my graphics card... o_O so I assume that there
> >           are more of these for other graphics cards.
> > 
> >      * The *file device ID* (major / minor)
> >         - specifying ranges is a good idea - ranges of device IDs are logically
> >           grouped and the order is also exposed and documented to user space.
> > 
> >      * The *file type*, read from filp->f_mode
> >         - includes regular files, directories, char devices, block devices,
> >           fifos and sockets
> >         - BUT this list of types in non-exhaustive:
> >           - there are valid struct file objects which have special types and are
> >             not distinguishable. They might not have a file type set in f_mode,
> >             even.  Examples include pidfds, or the Landlock ruleset FD. -- so: we
> >             do need a way to ignore file type altogether in an ioctl rule, so
> >             that such "special" file types can still be matched in the rule.
> > 
> >      * The *file path*
> >         - This can only really be checked against at open() time, imho.
> >           Doing it during the ioctl is too late, because the file might
> >           have been created in a different mount namespace, and then the
> >           current thread can't really make that file work with ioctls.
> >         - Not all open files *have* a file path (i.e. sockets, Landlock ruleset)
> 
> I think we can reach a lot through /proc/self/fd/

What I meant to say is: The struct file for some files does not refer
to a path on the file system that the file was opened from ==> Using
the file path as criterium does not cover all existing ioctl use
cases.

The thing that struck me about the above list of criteria is that each
of them seems to have gaps.  As an example, take timerfds
(timerfd_create(2)):

 * these do not get opened through a file system path, so the *file
   path* can not restrict them.
 * they are not character or block devices and do not have a device ID.
 * they don't match any of the file types in filp->f_mode.

So in order to permit the TFD_IOC_SET_TICKS ioctl on them, these three
criteria can't be used to describe a timerfd.

This is more important in an implementation where the criteria are
checked in security_file_ioctl, rather than in security_file_open.  In
an implementation where the criteria are only checked in
security_file_open, it would anyway not be possible to restrict ioctls
on the timerfd, and all files for which it would be possible, they
must have a path in the file system when they end up in that hook, I
suspect?

> > (b) The time at which the criteria are checked:
> > 
> >      * During open():
> >         - A check at this time is necessary to match against file paths, imho,
> >           as we already to in the ioctl patch set I've sent.
> > 
> >      * During ioctl():
> >         - A check at this time is *also* necessary, because without that, we will
> >           not be able to restrict ioctls on TTYs and other file descriptors that
> >           are obtained from other processes.

For completeness: I forgot to list here: The other reason where a
check during ioctl() is needed is the case as for the timerfd, the
pipe(2) and socketpair(2), where a file is created through a simple
syscall, but without spelling out a path.  If these kinds of files are
in scope for ioctl protection, it can't be done during the open()
check alone, I suspect?

> As I explained before, I don't think we should care about inherited or
> passed FDs. Other ways to get FDs (e.g. landlock_create_ruleset) should
> probably not be a priority for now.

I don't know what we should do about the "basic Unix tool" and
TIOCSTI/TIOCLINUX case, where it is possible to gain control over the
shell running in the tty that we get as stdout fd.

I'm in that situation with the little web application I run at home,
but the patch that you have sent for GNU tar at some point (and which
we should really revive :)) has the same problem: If an attacker
manages to do a Remote Code Execution in that tar process, they can
ioctl(1, TIOCSTI, ...) their way out into the shell which invoked tar,
and which is not restricted with tar's Landlock policy.

(I don't really see tar create a pty/tty pair either and shovel data
between them in a sidecar process or thread, just to protect against
that.)

Remark: For the specific TIOCSTI problem, I'm seeing a glimmer of
light with this patch set which has appeared in the meantime:
https://lore.kernel.org/all/20230710002645.v565c7xq5iddruse@begin/
(This will still require that distributions flip that Kconfig option
off, but the only(?) known user of TIOCSTI, BRLTTY, would continue
working.)

I would be more comfortable with doing the checks only at open(2) time
if the above patch landed in distributions so that you would need to
have CAP_SYS_ADMIN in order to use TIOCSTI.

Do you think this is realistic?  If this does not get flipped by
distributions, Landlock would continue to have these TIOCSTI problems
on these platforms (unless its users do the pty/tty pair thing, but
that seems like an unrealistic demand).


> > The tentative approach I've taken in my own patch set and the WIP part so far is:
> > 
> >   (1) Do file path checks at open() time (modeled as a access_fs right)
> >   (2) Do cmds, device ID and file type checks at ioctl() time.
> >       This is modeled independently of the file path check. -- both checks need to
> >       pass independently for an ioctl invocation to be permitted.
> 
> This looks good! However, see below an alternative approach for the rules
> combination.
> 
> > 
> > The API of that approach is:
> >   * The ruleset attribute gets a new handled_misc field,
> >     and when setting the first bit in it, it'll deny all ioctls
> >     unless there is a special ioctl rule added for them
> >     (and the path of the file was OK for ioctl at open time).
> >   * A new rule type with an associated struct landlock_ioctl_attr -
> >     that struct lets users define:
> >       - the desired mask of file types (or 0 for "all")
> >       - the designed device ID range
> >       - the list of ioctl cmds to be permitted for such files
> > 
> > An open question is whether the ruleset attr's "handled_misc" field should
> > rather be a "handled_ioctl_cmds" field, a set of restricted ioctl cmds
> > (potentially [0, 2^32)).  I think that would be more consistent conceptually
> > with how it was done for file system access rights, but obviously we can't model
> > it as a bit field any more - it would have to be some other suitable
> > representation of a set of integers, which also lets people say "all ioctls".
> 
> We might not need another ruleset's field because we can reuse the existing
> FS access rights, including the new IOCTL one(s). The trick is just to
> define a new way to match files (and optionally specific IOCTL commands),
> hence the landlock_inode_attr proposal. As you explained, this type of rule
> could match device IDs and file types.

I have to think about it and maybe try it out in code.  This might be
a better option if we go for doing the checks only at open(2) time.

I do think that device IDs are often a better way to specify device
files than their paths are.  Device IDs are a stable numbering scheme
that won't change, whereas the structure of /dev can be defined by
user space and is also often dynamically adding and removing devices.

> An alternative way to identify such
> properties would be to pass an FD and specify a subset of these properties
> to match on. This would avoid some side channel issues, and could be used to
> check for directory or file (as done for path_beneath) to avoid irrelevant
> access rights.

I don't fully understand what you mean here.  Do you mean to use an
open device file as example for what to match?  I don't see how
specifying the file type and device ID range as plain numbers could
lead to a race condition.


> I suggest to first handle path_beneath and inode rules as a binary OR set of
> rights (i.e. the sandboxed processes only needs one rule to match for the
> defined action to be allowed). Then, with a way to identify inodes rules, we
> could treat them as synthetic access rights and add a new allowed_inode
> field to path_beneath rules and remove the IOCTL access rights from their
> allowed_access field. This way, sandboxed processes would need both rules to
> match.

I'm not sure what the inode rule is.  Do you mean "ioctl rule"?

If yes, I do agree that a list of permitted ioctls is similar to the
access rights flags that we already have, and it would have to get
passed around in a similar fashion (as "synthetic access rights"),
albeit using a different data structure.

I'm still skeptical of the API approach where we tie previously
unrelated rules together, if that is what you mean here.  I find this
difficult to explain and reason about.  But in doubt we'll see in the
implementation how unwieldy it actually gets.


> > The upside of that approach would be that it could also be used to selectively
> > restrict specific known-evil ioctls, and letting all others continue to work.
> > For example, sandboxing or sudo-like programs could filter out TIOCSTI and
> > TIOCLINUX.

By the way, selectively restricting known-bad ioctls is still not
possible with the approach we discussed now, I think.  Maybe TIOCSTI
is the only bad one... I hope.

> > 
> > I'd be interested in hearing your opinion about this (also from the Chromium
> > side).
> > 
> > Thanks,
> > —Günther
> > 

Thanks,
–Günther
Mickaël Salaün July 24, 2023, 7:03 p.m. UTC | #8
On 14/07/2023 00:38, Günther Noack wrote:
> Hi!
> 
> On Wed, Jul 12, 2023 at 07:48:29PM +0200, Mickaël Salaün wrote:
>> On 12/07/2023 16:56, Günther Noack wrote:
>>> On Wed, Jul 12, 2023 at 12:55:19PM +0200, Mickaël Salaün wrote:
>>>> Thinking more about this, this first step is too restrictive, which
>>>> might lead to dangerous situations.
>>>>
>>>> My main concern is that this approach will deny innocuous or even "good"
>>>> IOCTL. For instance, if FIOCLEX is denied, this could leak file
>>>> descriptors and then introduce vulnerabilities.
>>>
>>> This is a good point.
>>>
>>>> As we discussed before, we cannot categorize all IOCTLs, but I think we
>>>> need to add exceptions for a subset of them, and maintain this list.
>>>> SELinux has some special handling within selinux_file_ioctl(), and we
>>>> should use that as a starting point. The do_vfs_ioctl() function is
>>>> another important place to look at. The main thing to keep in mind is
>>>> that Landlock's goal is to restrict access to data (e.g. FS, network,
>>>> IPC, bypass through other processes), not to restrict innocuous (at
>>>> least in theory) kernel features.
>>>>
>>>> I think, at least all IOCTLs targeting file descriptors themselves should
>>>> always be allowed, similar to fcntl(2)'s F_SETFD and F_SETFL commands:
>>>> - FIOCLEX
>>>> - FIONCLEX
>>>> - FIONBIO
>>>> - FIOASYNC
>>>>
>>>> Some others may not be a good idea:
>>>> - FIONREAD should be OK in theory but the VFS part only target regular
>>>> files and there is no access check according to the read right, which is
>>>> weird.
>>>> - FICLONE, FICLONERANGE, FIDEDUPRANGE: read/write actions.
>>>>
>>>> We should add a built-time or run-time safeguard to be sure that future
>>>> FD IOCTLs will be added to this list. I'm not sure how to efficiently
>>>> implement such protection though.
>>>
>>> I need to ponder it a bit.. :)  I also don't see an obvious solution yet how to
>>> tie these lists of ioctls together.
>>
>> I guess it should be ok to manually watch the do_vfs_ioctl() changes, but
>> definitely not optimal.
>>
>>>
>>>
>>>> I'm also wondering if we should not split the IOCTL access right into
>>>> three: mostly read, mostly write, and misc. _IOC_READ and _IOC_WRITE are
>>>> definitely not perfect, but tied to specific drivers (i.e. not a file
>>>> hierarchy but a block or character device) this might help until we get
>>>> a more fine-grained IOCTL access control. We should check if it's worth
>>>> it according to commonly used drivers. Looking at the TTY driver, most
>>>> IOCTLs are without read or write markers. Using this split could induce
>>>> a false sense of security, so it should be well motivated.
>>>
>>> As it was pointed out by the LWN article that Jeff Xu pointed to [1], this
>>> read/write bit in the ioctl command number is only referring to whether the
>>> *argument pointer* to the ioctl is being read or written, but you can not use
>>> this bit to infer whether the ioctl itself performs a "reading" or "writing"
>>> access to the underlying file.
>>>
>>> As the LWN article explains, SELinux has fallen for the same trap in the past,
>>> the post [2] has an example for an ioctl where the read/write bits for the
>>> argument are not related to what the underlying operation does.
>>>
>>> It might be that you could potentially use the _IOC_READ and _IOC_WRITE bits to
>>> group smaller subsets of the ioctl cmd space, such as for a single device type.
>>> But then, the users would have to go through the list of supported ioctls one by
>>> one anyway, to ensure that this works for that subset.  If they are going
>>> through them one by one anyway, they might maybe just as well list them out in
>>> the filter rule...?
>>>
>>> [1] https://lwn.net/Articles/428140
>>> [2] https://lwn.net/Articles/428142/
>>
>> Right, I fell again in this trap, !_IOC_READ cannot even guarantee non-write
>> actions.
>>
>> A useful split would be at least between devices and regular
>> files/directories, something like this:
>> - LANDLOCK_ACCESS_FS_IOCTL_DEV: allows IOCTLs on character or block devices,
>> which should be targeted on specific paths.
>> - LANDLOCK_ACCESS_FS_IOCTL_NODEV: allows IOCTLs on regular files,
>> directories, unix sockets, pipes, and symlinks. These are targeting
>> filesystems (e.g. ext4's fsverity) or common Linux file types.
> 
> To make sure we are on the same page, let me paraphrase:
> 
> You are suggesting that we should split the LANDLOCK_ACCESS_FS_IOCTL
> right into a LANDLOCK_ACCESS_FS_IOCTL_DEV part (for block and
> character devices) and a LANDLOCK_ACCESS_FS_IOCTL_NODEV part (for
> regular files, directories, named(!) unix sockets, named(!) pipes and
> symlinks)?  The check would presumably be done during the open(2) call
> and then store the access right on the freshly opened struct file?

Correct

> 
> If Landlock only checks the ioctl criteria during open(2), that would
> mean that file descriptors created through other means would be
> unaffected.

Correct

> 
> In particular, the protection would only apply to named pipes and Unix
> sockets which get newly opened through the file system, but it would
> not apply to pipes created through pipe(2) and Unix sockets created
> through socketpair(2)?

Correct

> 
> (It is more clearly a philosophy of "protecting resources", rather
> than a philosophy of limiting access to the thousands of potentially
> buggy ioctl implementations. - But I think it might be reasonable to
> permit unnamed pipes and socketpairs - they are useful mechanisms and
> seem harmless as long as their implementations don't have bugs.)

The goal of Landlock is to limit access to new resources (e.g. new FD 
obtained from an existing FD or a path).

Unnamed pipes and socketpairs are not a way to (directly) access new 
kernel resources/data, hence out of scope for Landlock. Abstract unix 
sockets will need to be restricted though, but not with a path_beneath 
rule (and not right now with this patch series).


> 
> 
>> I think it makes sense because the underlying filesystems should already
>> check for read/write access, which is not the case for block/char devices.
>> Pipe and unix socket IOCTLs are quite specific but don't touch the
>> underlying filesystem, and it should be allowed to properly use them. It
>> should be noted that the pipe and socket IOCTL implementations don't care
>> about their file mode though; I guess the rationale might be that IOCTLs may
>> be required to (efficiently) either read or write.
>>
> 
> I don't understand your remark about the read/write access.

I meant that regular file/directory IOCTLs (e.g. fscrypt) should already 
check for read or write access according to the IOCTL command. This 
doesn't seem to be the case for devices because they don't modify and 
are unrelated to the underlying filesystem.


> 
> Pipes have a read end and a write end, where only one of the two
> operations should work.  Unix sockets are always bidirectional, if I
> remember this correctly.

Yes, but the pipe and socket IOCTL commands don't check if the related 
FD is opened with read nor write.

> 
>> Reading your following comments, this dev/nodev classification would be like
>> the *file type* item, but simpler and only for file descriptors accessible
>> through the filesystem, which I guess could be everything because of procfs…
>>
>> This split might also help for the landlock_inode_attr properties, but it
>> would also be a bit redundant with the file type match…
> 
> I agree that this dev/nodev classification seems like a simpler
> version of the *file type* item from below.
> 
>>
>>
>>>
>>>
>>> I've also pondered more about the ioctl support. I have a work-in-progress patch
>>> set which filters ioctls according to various criteria, but it's not fully
>>> fleshed out yet.
>>>
>>> In the big picture: I think that the main ways how we can build this differently
>>> are (a) the criteria on which to decide whether an ioctl is permitted, and (b)
>>> the time at which we evaluate these criteria (time of open() vs. time of
>>> ioctl()).  We can also evaluate the criteria at different times, depending on
>>> which criterium it is.
>>>
>>> So:
>>>
>>> (a) The criteria on which to decide that an ioctl is permitted:
>>>
>>>       We have discussed the followowing ones so far:
>>>
>>>       * The *ioctl cmd* (request) itself
>>>          - needs to be taken into account, obviously.
>>>          - ioctl cmds do not have an obvious ordering exposed to userspace,
>>>            so asking users to specify ranges is potentially difficult
>>>          - asking users to list all individual ioctls they do might result in
>>>            lists that are larger than I had thought. I've straced Firefox and
>>>            found that it did about 20-30 direct-rendering related ioctls, and most
>>>            of them were specific to my graphics card... o_O so I assume that there
>>>            are more of these for other graphics cards.
>>>
>>>       * The *file device ID* (major / minor)
>>>          - specifying ranges is a good idea - ranges of device IDs are logically
>>>            grouped and the order is also exposed and documented to user space.
>>>
>>>       * The *file type*, read from filp->f_mode
>>>          - includes regular files, directories, char devices, block devices,
>>>            fifos and sockets
>>>          - BUT this list of types in non-exhaustive:
>>>            - there are valid struct file objects which have special types and are
>>>              not distinguishable. They might not have a file type set in f_mode,
>>>              even.  Examples include pidfds, or the Landlock ruleset FD. -- so: we
>>>              do need a way to ignore file type altogether in an ioctl rule, so
>>>              that such "special" file types can still be matched in the rule.
>>>
>>>       * The *file path*
>>>          - This can only really be checked against at open() time, imho.
>>>            Doing it during the ioctl is too late, because the file might
>>>            have been created in a different mount namespace, and then the
>>>            current thread can't really make that file work with ioctls.
>>>          - Not all open files *have* a file path (i.e. sockets, Landlock ruleset)
>>
>> I think we can reach a lot through /proc/self/fd/
> 
> What I meant to say is: The struct file for some files does not refer
> to a path on the file system that the file was opened from ==> Using
> the file path as criterium does not cover all existing ioctl use
> cases.

Correct

> 
> The thing that struck me about the above list of criteria is that each
> of them seems to have gaps.  As an example, take timerfds
> (timerfd_create(2)):
> 
>   * these do not get opened through a file system path, so the *file
>     path* can not restrict them.

>   * they are not character or block devices and do not have a device ID.
>   * they don't match any of the file types in filp->f_mode.

Indeed, we may need a way to identify this kind of FD in the future but 
it should not be an issue for now with the path_beneath rules. I guess 
we could match the anon_inode:[*] name, but I would prefer to avoid 
using strings. A better way to identify this kind of FD would be to pass 
a similar one in a rule, in the same way as for path_beneath (as I 
suggested in a previous email).


> 
> So in order to permit the TFD_IOC_SET_TICKS ioctl on them, these three
> criteria can't be used to describe a timerfd.

Correct, and timerfd don't give access to (FS-related) data.

If we want to restrict this kind of FD (and if it's worth it), we can 
follow the same approach as for restricting new socket creation. 
Restricting specific IOCTLs on these FDs would require a 
capability-based approach (cf. Capsicum): explicitly attach restrictions 
to a FD, not a process, and the mechanism is almost there thanks to the 
truncate access right patches. It would make sense for Landlock to 
support this kind of FD capabilities, but maybe not right now.


> 
> This is more important in an implementation where the criteria are
> checked in security_file_ioctl, rather than in security_file_open.  In
> an implementation where the criteria are only checked in
> security_file_open, it would anyway not be possible to restrict ioctls
> on the timerfd, and all files for which it would be possible, they
> must have a path in the file system when they end up in that hook, I
> suspect?
> 
>>> (b) The time at which the criteria are checked:
>>>
>>>       * During open():
>>>          - A check at this time is necessary to match against file paths, imho,
>>>            as we already to in the ioctl patch set I've sent.
>>>
>>>       * During ioctl():
>>>          - A check at this time is *also* necessary, because without that, we will
>>>            not be able to restrict ioctls on TTYs and other file descriptors that
>>>            are obtained from other processes.
> 
> For completeness: I forgot to list here: The other reason where a
> check during ioctl() is needed is the case as for the timerfd, the
> pipe(2) and socketpair(2), where a file is created through a simple
> syscall, but without spelling out a path.  If these kinds of files are
> in scope for ioctl protection, it can't be done during the open()
> check alone, I suspect?

Correct, but we don't need this kind of restriction for now.

> 
>> As I explained before, I don't think we should care about inherited or
>> passed FDs. Other ways to get FDs (e.g. landlock_create_ruleset) should
>> probably not be a priority for now.
> 
> I don't know what we should do about the "basic Unix tool" and
> TIOCSTI/TIOCLINUX case, where it is possible to gain control over the
> shell running in the tty that we get as stdout fd.
> 
> I'm in that situation with the little web application I run at home,
> but the patch that you have sent for GNU tar at some point (and which
> we should really revive :)) has the same problem: If an attacker
> manages to do a Remote Code Execution in that tar process, they can
> ioctl(1, TIOCSTI, ...) their way out into the shell which invoked tar,
> and which is not restricted with tar's Landlock policy.
> 
> (I don't really see tar create a pty/tty pair either and shovel data
> between them in a sidecar process or thread, just to protect against
> that.)

Indeed, and that's why sandboxing an application might raise some 
challenges. We should note that a sandboxed application might only be 
safely used in some cases (e.g. pipe stdio and close other FDs), but I 
agree that this is not satisfactory for now, and there are still gaps.


> 
> Remark: For the specific TIOCSTI problem, I'm seeing a glimmer of
> light with this patch set which has appeared in the meantime:
> https://lore.kernel.org/all/20230710002645.v565c7xq5iddruse@begin/
> (This will still require that distributions flip that Kconfig option
> off, but the only(?) known user of TIOCSTI, BRLTTY, would continue
> working.)

I hope most distros are disabling CONFIG_LEGACY_TIOCSTI, otherwise users 
should still be able to tweak the related sysctl.


> 
> I would be more comfortable with doing the checks only at open(2) time
> if the above patch landed in distributions so that you would need to
> have CAP_SYS_ADMIN in order to use TIOCSTI.
> 
> Do you think this is realistic?  If this does not get flipped by
> distributions, Landlock would continue to have these TIOCSTI problems
> on these platforms (unless its users do the pty/tty pair thing, but
> that seems like an unrealistic demand).

What if we use an FD to identify an inode with landlock_inode_attr rule? 
We could have some flag to "pin" the restriction on this specific 
FD/inode, or only match the device type, or the file type… We need to 
think a bit more about the implications though.


> 
> 
>>> The tentative approach I've taken in my own patch set and the WIP part so far is:
>>>
>>>    (1) Do file path checks at open() time (modeled as a access_fs right)
>>>    (2) Do cmds, device ID and file type checks at ioctl() time.
>>>        This is modeled independently of the file path check. -- both checks need to
>>>        pass independently for an ioctl invocation to be permitted.
>>
>> This looks good! However, see below an alternative approach for the rules
>> combination.
>>
>>>
>>> The API of that approach is:
>>>    * The ruleset attribute gets a new handled_misc field,
>>>      and when setting the first bit in it, it'll deny all ioctls
>>>      unless there is a special ioctl rule added for them
>>>      (and the path of the file was OK for ioctl at open time).
>>>    * A new rule type with an associated struct landlock_ioctl_attr -
>>>      that struct lets users define:
>>>        - the desired mask of file types (or 0 for "all")
>>>        - the designed device ID range
>>>        - the list of ioctl cmds to be permitted for such files
>>>
>>> An open question is whether the ruleset attr's "handled_misc" field should
>>> rather be a "handled_ioctl_cmds" field, a set of restricted ioctl cmds
>>> (potentially [0, 2^32)).  I think that would be more consistent conceptually
>>> with how it was done for file system access rights, but obviously we can't model
>>> it as a bit field any more - it would have to be some other suitable
>>> representation of a set of integers, which also lets people say "all ioctls".
>>
>> We might not need another ruleset's field because we can reuse the existing
>> FS access rights, including the new IOCTL one(s). The trick is just to
>> define a new way to match files (and optionally specific IOCTL commands),
>> hence the landlock_inode_attr proposal. As you explained, this type of rule
>> could match device IDs and file types.
> 
> I have to think about it and maybe try it out in code.  This might be
> a better option if we go for doing the checks only at open(2) time.
> 
> I do think that device IDs are often a better way to specify device
> files than their paths are.  Device IDs are a stable numbering scheme
> that won't change, whereas the structure of /dev can be defined by
> user space and is also often dynamically adding and removing devices.
> 
>> An alternative way to identify such
>> properties would be to pass an FD and specify a subset of these properties
>> to match on. This would avoid some side channel issues, and could be used to
>> check for directory or file (as done for path_beneath) to avoid irrelevant
>> access rights.
> 
> I don't fully understand what you mean here.  Do you mean to use an
> open device file as example for what to match?

Yes

>  I don't see how
> specifying the file type and device ID range as plain numbers could
> lead to a race condition.

No race condition but side channel issues. For instance, a landlocked 
process could infer some file properties by adding and testing a 
Landlock rule even if it is not allowed to read such file properties 
(because of potential Landlock or other restrictions). Using a FD 
enables Landlock to check that the process is indeed allowed to read 
such properties, or we may decide that it is not necessary to do so.


> 
> 
>> I suggest to first handle path_beneath and inode rules as a binary OR set of
>> rights (i.e. the sandboxed processes only needs one rule to match for the
>> defined action to be allowed). Then, with a way to identify inodes rules, we
>> could treat them as synthetic access rights and add a new allowed_inode
>> field to path_beneath rules and remove the IOCTL access rights from their
>> allowed_access field. This way, sandboxed processes would need both rules to
>> match.
> 
> I'm not sure what the inode rule is.  Do you mean "ioctl rule"?

I mean a landlock_inode_attr rule that identify an inode with a device 
type, or a file type, or a specific inode (i.e. object)… and an IOCTL 
command as well (i.e. action).

> 
> If yes, I do agree that a list of permitted ioctls is similar to the
> access rights flags that we already have, and it would have to get
> passed around in a similar fashion (as "synthetic access rights"),
> albeit using a different data structure.
> 
> I'm still skeptical of the API approach where we tie previously
> unrelated rules together, if that is what you mean here.  I find this
> difficult to explain and reason about.  But in doubt we'll see in the
> implementation how unwieldy it actually gets.

Right, we don't need to implement the synthetic access rights with the 
current step though.

> 
> 
>>> The upside of that approach would be that it could also be used to selectively
>>> restrict specific known-evil ioctls, and letting all others continue to work.
>>> For example, sandboxing or sudo-like programs could filter out TIOCSTI and
>>> TIOCLINUX.
> 
> By the way, selectively restricting known-bad ioctls is still not
> possible with the approach we discussed now, I think.  Maybe TIOCSTI
> is the only bad one... I hope.

It would not be possible with the landlock_path_beneath_attr, but the 
landlock_inode_attr could be enough.

> 
>>>
>>> I'd be interested in hearing your opinion about this (also from the Chromium
>>> side).
>>>
>>> Thanks,
>>> —Günther
>>>
> 
> Thanks,
> –Günther
Günther Noack July 27, 2023, 9:32 a.m. UTC | #9
Hello Mickaël!

Overall, I believe that I thought too far ahead here and now we've
been mixing the discussions for different steps from the three-step
approach that we discussed in [1].

In order to make progress here, let me try to disentangle in this mail
which parts we need for the current step (1) and which parts are only
needed for later steps.

[1] https://lore.kernel.org/linux-security-module/d4f1395c-d2d4-1860-3a02-2a0c023dd761@digikod.net/


On Mon, Jul 24, 2023 at 09:03:42PM +0200, Mickaël Salaün wrote:
> On 14/07/2023 00:38, Günther Noack wrote:
> > On Wed, Jul 12, 2023 at 07:48:29PM +0200, Mickaël Salaün wrote:
> > > A useful split would be at least between devices and regular
> > > files/directories, something like this:
> > > - LANDLOCK_ACCESS_FS_IOCTL_DEV: allows IOCTLs on character or block devices,
> > > which should be targeted on specific paths.
> > > - LANDLOCK_ACCESS_FS_IOCTL_NODEV: allows IOCTLs on regular files,
> > > directories, unix sockets, pipes, and symlinks. These are targeting
> > > filesystems (e.g. ext4's fsverity) or common Linux file types.
> > 
> > To make sure we are on the same page, let me paraphrase:
> > 
> > You are suggesting that we should split the LANDLOCK_ACCESS_FS_IOCTL
> > right into a LANDLOCK_ACCESS_FS_IOCTL_DEV part (for block and
> > character devices) and a LANDLOCK_ACCESS_FS_IOCTL_NODEV part (for
> > regular files, directories, named(!) unix sockets, named(!) pipes and
> > symlinks)?  The check would presumably be done during the open(2) call
> > and then store the access right on the freshly opened struct file?
> 
> Correct

OK, I'll add this to step (1) then.


> > (It is more clearly a philosophy of "protecting resources", rather
> > than a philosophy of limiting access to the thousands of potentially
> > buggy ioctl implementations. - But I think it might be reasonable to
> > permit unnamed pipes and socketpairs - they are useful mechanisms and
> > seem harmless as long as their implementations don't have bugs.)
> 
> The goal of Landlock is to limit access to new resources (e.g. new FD
> obtained from an existing FD or a path).
> 
> Unnamed pipes and socketpairs are not a way to (directly) access new kernel
> resources/data, hence out of scope for Landlock. Abstract unix sockets will
> need to be restricted though, but not with a path_beneath rule (and not
> right now with this patch series).

OK, fair enough.  I can see that this is conceptually cleaner within
Landlock.

Let's go for that approach then, where Landlock only restricts newly
opened path-based files, and where we leave inherited file descriptors
and the ones created through pipe(2), socketpair(2) and timerfds as
they are for now.

It feels like TIOCSTI (and TIOCLINUX) are probably the biggest
problems when it comes to these inherited files.  With some luck,
TIOCSTI will be turned off by distributions soon (and TIOCLINUX only
works on the text console, which is luckily not in use that much).
Fingers crossed.


> > > I think it makes sense because the underlying filesystems should already
> > > check for read/write access, which is not the case for block/char devices.
> > > Pipe and unix socket IOCTLs are quite specific but don't touch the
> > > underlying filesystem, and it should be allowed to properly use them. It
> > > should be noted that the pipe and socket IOCTL implementations don't care
> > > about their file mode though; I guess the rationale might be that IOCTLs may
> > > be required to (efficiently) either read or write.
> > > 
> > 
> > I don't understand your remark about the read/write access.
> 
> I meant that regular file/directory IOCTLs (e.g. fscrypt) should already
> check for read or write access according to the IOCTL command. This doesn't
> seem to be the case for devices because they don't modify and are unrelated
> to the underlying filesystem.
> 
> 
> > 
> > Pipes have a read end and a write end, where only one of the two
> > operations should work.  Unix sockets are always bidirectional, if I
> > remember this correctly.
> 
> Yes, but the pipe and socket IOCTL commands don't check if the related FD is
> opened with read nor write.

I still don't understand your remarks about ioctl commands not
checking read and write flags.  To clarify: What you are talking about
is that the implementations of individual ioctl commands should all
check the read and write mode flags on the struct file, is that right?

I'm puzzled how you come to the conclusion that devices don't do such
checks - did you read some ioctl command implementations, or is it a
more underlying principle that I was not aware of so far?

So far, I've always been under the impression that the modes on device
files are also reflected on the associated struct file after they are
opened.  As in, you open a block device for reading to read its
contents, but you need to open it for writing to modify it.  Are these
rights not respected by the ioctl commands?


In any case - I believe the only reason why we are discussing this is
to justify the DEV/NODEV split, and that one in itself is not
controversial to me, even when I admittedly don't fully follow your
reasoning.


> > The thing that struck me about the above list of criteria is that each
> > of them seems to have gaps.  As an example, take timerfds
> > (timerfd_create(2)):
> > 
> >   * these do not get opened through a file system path, so the *file
> >     path* can not restrict them.
> 
> >   * they are not character or block devices and do not have a device ID.
> >   * they don't match any of the file types in filp->f_mode.
> 
> Indeed, we may need a way to identify this kind of FD in the future but it
> should not be an issue for now with the path_beneath rules. I guess we could
> match the anon_inode:[*] name, but I would prefer to avoid using strings. A
> better way to identify this kind of FD would be to pass a similar one in a
> rule, in the same way as for path_beneath (as I suggested in a previous
> email).

(In retrospect, the timerfd was a bad example, because it is acquired
through something else than open(2).  I don't currently have an
immediate example at hand for an anon_inode which is reachable through
the file system (except /proc/.../fd).)

Agreed that matching the "anon_inode:*" name would be a hack.  That
does not look like it was meant as a reliable interface.

This discussion seems like it belongs to step (2) and later though, so
wouldn't block the first patch set, I think.


> > So in order to permit the TFD_IOC_SET_TICKS ioctl on them, these three
> > criteria can't be used to describe a timerfd.
> 
> Correct, and timerfd don't give access to (FS-related) data.
> 
> If we want to restrict this kind of FD (and if it's worth it), we can follow
> the same approach as for restricting new socket creation. Restricting
> specific IOCTLs on these FDs would require a capability-based approach (cf.
> Capsicum): explicitly attach restrictions to a FD, not a process, and the
> mechanism is almost there thanks to the truncate access right patches. It
> would make sense for Landlock to support this kind of FD capabilities, but
> maybe not right now.

+1 let's discuss in a follow-up patch set.


> > For completeness: I forgot to list here: The other reason where a
> > check during ioctl() is needed is the case as for the timerfd, the
> > pipe(2) and socketpair(2), where a file is created through a simple
> > syscall, but without spelling out a path.  If these kinds of files are
> > in scope for ioctl protection, it can't be done during the open()
> > check alone, I suspect?
> 
> Correct, but we don't need this kind of restriction for now.

OK


> > > As I explained before, I don't think we should care about inherited or
> > > passed FDs. Other ways to get FDs (e.g. landlock_create_ruleset) should
> > > probably not be a priority for now.
> > 
> > I don't know what we should do about the "basic Unix tool" and
> > TIOCSTI/TIOCLINUX case, where it is possible to gain control over the
> > shell running in the tty that we get as stdout fd.
> > 
> > I'm in that situation with the little web application I run at home,
> > but the patch that you have sent for GNU tar at some point (and which
> > we should really revive :)) has the same problem: If an attacker
> > manages to do a Remote Code Execution in that tar process, they can
> > ioctl(1, TIOCSTI, ...) their way out into the shell which invoked tar,
> > and which is not restricted with tar's Landlock policy.
> > 
> > (I don't really see tar create a pty/tty pair either and shovel data
> > between them in a sidecar process or thread, just to protect against
> > that.)
> 
> Indeed, and that's why sandboxing an application might raise some
> challenges. We should note that a sandboxed application might only be safely
> used in some cases (e.g. pipe stdio and close other FDs), but I agree that
> this is not satisfactory for now, and there are still gaps.

OK, I'll make sure it shows up in the documentation.


> > Remark: For the specific TIOCSTI problem, I'm seeing a glimmer of
> > light with this patch set which has appeared in the meantime:
> > https://lore.kernel.org/all/20230710002645.v565c7xq5iddruse@begin/
> > (This will still require that distributions flip that Kconfig option
> > off, but the only(?) known user of TIOCSTI, BRLTTY, would continue
> > working.)
> 
> I hope most distros are disabling CONFIG_LEGACY_TIOCSTI, otherwise users
> should still be able to tweak the related sysctl.

+1


> > I would be more comfortable with doing the checks only at open(2) time
> > if the above patch landed in distributions so that you would need to
> > have CAP_SYS_ADMIN in order to use TIOCSTI.
> > 
> > Do you think this is realistic?  If this does not get flipped by
> > distributions, Landlock would continue to have these TIOCSTI problems
> > on these platforms (unless its users do the pty/tty pair thing, but
> > that seems like an unrealistic demand).
> 
> What if we use an FD to identify an inode with landlock_inode_attr rule? We
> could have some flag to "pin" the restriction on this specific FD/inode, or
> only match the device type, or the file type… We need to think a bit more
> about the implications though.

This would also be a later step and not be part of step (1).

The idea of using an FD as an "example" is interesting, but I'm not
fully sold on it yet.  I need to ponder it more.  Some specific
points:

* Creating such FDs might have unwanted side effects, or be
  disproportionally difficult, when they are just created for the
  purpose of defining a Landlock ruleset.

* The matching is unclear to me.  In particular, we've discussed
  before to restrict dev_t ranges (fixed major, range on minor) - I
  don't know how that would be done with this approach.


> >  I don't see how
> > specifying the file type and device ID range as plain numbers could
> > lead to a race condition.
> 
> No race condition but side channel issues. For instance, a landlocked
> process could infer some file properties by adding and testing a Landlock
> rule even if it is not allowed to read such file properties (because of
> potential Landlock or other restrictions). Using a FD enables Landlock to
> check that the process is indeed allowed to read such properties, or we may
> decide that it is not necessary to do so.

Understood - so IIUC the scenario is that a process is not permitted
to read file attributes, but it'll be able to infer the device ID by
defining a dev_t-based Landlock rule and then observing whether ioctl
still works.

(I'll also postpone it to step (2) or later then)


> > If yes, I do agree that a list of permitted ioctls is similar to the
> > access rights flags that we already have, and it would have to get
> > passed around in a similar fashion (as "synthetic access rights"),
> > albeit using a different data structure.
> > 
> > I'm still skeptical of the API approach where we tie previously
> > unrelated rules together, if that is what you mean here.  I find this
> > difficult to explain and reason about.  But in doubt we'll see in the
> > implementation how unwieldy it actually gets.
> 
> Right, we don't need to implement the synthetic access rights with the
> current step though.

+1 OK

> 
> > 
> > 
> > > > The upside of that approach would be that it could also be used to selectively
> > > > restrict specific known-evil ioctls, and letting all others continue to work.
> > > > For example, sandboxing or sudo-like programs could filter out TIOCSTI and
> > > > TIOCLINUX.
> > 
> > By the way, selectively restricting known-bad ioctls is still not
> > possible with the approach we discussed now, I think.  Maybe TIOCSTI
> > is the only bad one... I hope.
> 
> It would not be possible with the landlock_path_beneath_attr, but the
> landlock_inode_attr could be enough.

Will ponder it -- it has the limitation as I said above that it can't
restrict device ranges, but it could be used to apply restrictions to
specific opened inodes.

One difference I see with this approach is that the rights would not
transfer along with the opened file when the files get passed between
processes.  So the policy for that inode would apply to the enforcing
process and its new children, but it would not apply to other
processes which the file is given to.


---

Summarizing this, I believe that the parts that we need for step (1)
are the following ones:

(1) Identify and blanket-permit a small list of ioctl cmds which work
    on all file descriptors (FIOCLEX, FIONCLEX, FIONBIO, FIOASYNC, and
    others?)

    Compare
    https://lore.kernel.org/linux-security-module/6dfc0198-9010-7c54-2699-d3b867249850@digikod.net/

(2) Split into LANDLOCK_ACCESS_FS_IOCTL into a ..._DEV and a ..._NODEV part.

(3) Point out in documentation that this IOCTL restriction scheme only
    applies to newly opened FDs and in particular point out the common
    use case where that is a TTY, and what to do in this case.

If you agree, I'd go ahead and implement that as step (1) and we can
discuss the more advanced ideas in the context of a follow-up.

Thanks,
—Günther
Mickaël Salaün July 28, 2023, 1:52 p.m. UTC | #10
On 27/07/2023 11:32, Günther Noack wrote:
> Hello Mickaël!
> 
> Overall, I believe that I thought too far ahead here and now we've
> been mixing the discussions for different steps from the three-step
> approach that we discussed in [1].
> 
> In order to make progress here, let me try to disentangle in this mail
> which parts we need for the current step (1) and which parts are only
> needed for later steps.
> 
> [1] https://lore.kernel.org/linux-security-module/d4f1395c-d2d4-1860-3a02-2a0c023dd761@digikod.net/
> 
> 
> On Mon, Jul 24, 2023 at 09:03:42PM +0200, Mickaël Salaün wrote:
>> On 14/07/2023 00:38, Günther Noack wrote:
>>> On Wed, Jul 12, 2023 at 07:48:29PM +0200, Mickaël Salaün wrote:
>>>> A useful split would be at least between devices and regular
>>>> files/directories, something like this:
>>>> - LANDLOCK_ACCESS_FS_IOCTL_DEV: allows IOCTLs on character or block devices,
>>>> which should be targeted on specific paths.
>>>> - LANDLOCK_ACCESS_FS_IOCTL_NODEV: allows IOCTLs on regular files,
>>>> directories, unix sockets, pipes, and symlinks. These are targeting
>>>> filesystems (e.g. ext4's fsverity) or common Linux file types.
>>>
>>> To make sure we are on the same page, let me paraphrase:
>>>
>>> You are suggesting that we should split the LANDLOCK_ACCESS_FS_IOCTL
>>> right into a LANDLOCK_ACCESS_FS_IOCTL_DEV part (for block and
>>> character devices) and a LANDLOCK_ACCESS_FS_IOCTL_NODEV part (for
>>> regular files, directories, named(!) unix sockets, named(!) pipes and
>>> symlinks)?  The check would presumably be done during the open(2) call
>>> and then store the access right on the freshly opened struct file?
>>
>> Correct
> 
> OK, I'll add this to step (1) then.
> 
> 
>>> (It is more clearly a philosophy of "protecting resources", rather
>>> than a philosophy of limiting access to the thousands of potentially
>>> buggy ioctl implementations. - But I think it might be reasonable to
>>> permit unnamed pipes and socketpairs - they are useful mechanisms and
>>> seem harmless as long as their implementations don't have bugs.)
>>
>> The goal of Landlock is to limit access to new resources (e.g. new FD
>> obtained from an existing FD or a path).
>>
>> Unnamed pipes and socketpairs are not a way to (directly) access new kernel
>> resources/data, hence out of scope for Landlock. Abstract unix sockets will
>> need to be restricted though, but not with a path_beneath rule (and not
>> right now with this patch series).
> 
> OK, fair enough.  I can see that this is conceptually cleaner within
> Landlock.
> 
> Let's go for that approach then, where Landlock only restricts newly
> opened path-based files, and where we leave inherited file descriptors
> and the ones created through pipe(2), socketpair(2) and timerfds as
> they are for now.
> 
> It feels like TIOCSTI (and TIOCLINUX) are probably the biggest
> problems when it comes to these inherited files.  With some luck,
> TIOCSTI will be turned off by distributions soon (and TIOCLINUX only
> works on the text console, which is luckily not in use that much).
> Fingers crossed.
> 
> 
>>>> I think it makes sense because the underlying filesystems should already
>>>> check for read/write access, which is not the case for block/char devices.
>>>> Pipe and unix socket IOCTLs are quite specific but don't touch the
>>>> underlying filesystem, and it should be allowed to properly use them. It
>>>> should be noted that the pipe and socket IOCTL implementations don't care
>>>> about their file mode though; I guess the rationale might be that IOCTLs may
>>>> be required to (efficiently) either read or write.
>>>>
>>>
>>> I don't understand your remark about the read/write access.
>>
>> I meant that regular file/directory IOCTLs (e.g. fscrypt) should already
>> check for read or write access according to the IOCTL command. This doesn't
>> seem to be the case for devices because they don't modify and are unrelated
>> to the underlying filesystem.
>>
>>
>>>
>>> Pipes have a read end and a write end, where only one of the two
>>> operations should work.  Unix sockets are always bidirectional, if I
>>> remember this correctly.
>>
>> Yes, but the pipe and socket IOCTL commands don't check if the related FD is
>> opened with read nor write.
> 
> I still don't understand your remarks about ioctl commands not
> checking read and write flags.  To clarify: What you are talking about
> is that the implementations of individual ioctl commands should all
> check the read and write mode flags on the struct file, is that right?
> 
> I'm puzzled how you come to the conclusion that devices don't do such
> checks - did you read some ioctl command implementations, or is it a
> more underlying principle that I was not aware of so far?

I took a look at fscrypt IOCTLs for instance, and other FS IOCTLs seems 
to correctly check for FD's read/write mode according to the IOCTL 
behavior. From what I've seen, IOCTLs implemented by device drivers 
don't care about file mode.


> 
> So far, I've always been under the impression that the modes on device
> files are also reflected on the associated struct file after they are
> opened.  As in, you open a block device for reading to read its
> contents, but you need to open it for writing to modify it.  Are these
> rights not respected by the ioctl commands?

File mode is correctly tracked but ignore by most IOCTL handlers, 
probably because file mode is dedicated to raw files and directories.

> 
> 
> In any case - I believe the only reason why we are discussing this is
> to justify the DEV/NODEV split, and that one in itself is not
> controversial to me, even when I admittedly don't fully follow your
> reasoning.

The main reason is than I don't want applications/users to not be 
allowed to use "legitimate" IOCTL, for instance to correctly encrypt 
their own files. If we only have one IOCTL right, we cannot easily 
differentiate between the targeted file types. However, this split might 
be too costly, cf. my comment in the below summary.


> 
> 
>>> The thing that struck me about the above list of criteria is that each
>>> of them seems to have gaps.  As an example, take timerfds
>>> (timerfd_create(2)):
>>>
>>>    * these do not get opened through a file system path, so the *file
>>>      path* can not restrict them.
>>
>>>    * they are not character or block devices and do not have a device ID.
>>>    * they don't match any of the file types in filp->f_mode.
>>
>> Indeed, we may need a way to identify this kind of FD in the future but it
>> should not be an issue for now with the path_beneath rules. I guess we could
>> match the anon_inode:[*] name, but I would prefer to avoid using strings. A
>> better way to identify this kind of FD would be to pass a similar one in a
>> rule, in the same way as for path_beneath (as I suggested in a previous
>> email).
> 
> (In retrospect, the timerfd was a bad example, because it is acquired
> through something else than open(2).  I don't currently have an
> immediate example at hand for an anon_inode which is reachable through
> the file system (except /proc/.../fd).)
> 
> Agreed that matching the "anon_inode:*" name would be a hack.  That
> does not look like it was meant as a reliable interface.
> 
> This discussion seems like it belongs to step (2) and later though, so
> wouldn't block the first patch set, I think.

Yes :)

> 
> 
>>> So in order to permit the TFD_IOC_SET_TICKS ioctl on them, these three
>>> criteria can't be used to describe a timerfd.
>>
>> Correct, and timerfd don't give access to (FS-related) data.
>>
>> If we want to restrict this kind of FD (and if it's worth it), we can follow
>> the same approach as for restricting new socket creation. Restricting
>> specific IOCTLs on these FDs would require a capability-based approach (cf.
>> Capsicum): explicitly attach restrictions to a FD, not a process, and the
>> mechanism is almost there thanks to the truncate access right patches. It
>> would make sense for Landlock to support this kind of FD capabilities, but
>> maybe not right now.
> 
> +1 let's discuss in a follow-up patch set.
> 
> 
>>> For completeness: I forgot to list here: The other reason where a
>>> check during ioctl() is needed is the case as for the timerfd, the
>>> pipe(2) and socketpair(2), where a file is created through a simple
>>> syscall, but without spelling out a path.  If these kinds of files are
>>> in scope for ioctl protection, it can't be done during the open()
>>> check alone, I suspect?
>>
>> Correct, but we don't need this kind of restriction for now.
> 
> OK
> 
> 
>>>> As I explained before, I don't think we should care about inherited or
>>>> passed FDs. Other ways to get FDs (e.g. landlock_create_ruleset) should
>>>> probably not be a priority for now.
>>>
>>> I don't know what we should do about the "basic Unix tool" and
>>> TIOCSTI/TIOCLINUX case, where it is possible to gain control over the
>>> shell running in the tty that we get as stdout fd.
>>>
>>> I'm in that situation with the little web application I run at home,
>>> but the patch that you have sent for GNU tar at some point (and which
>>> we should really revive :)) has the same problem: If an attacker
>>> manages to do a Remote Code Execution in that tar process, they can
>>> ioctl(1, TIOCSTI, ...) their way out into the shell which invoked tar,
>>> and which is not restricted with tar's Landlock policy.
>>>
>>> (I don't really see tar create a pty/tty pair either and shovel data
>>> between them in a sidecar process or thread, just to protect against
>>> that.)
>>
>> Indeed, and that's why sandboxing an application might raise some
>> challenges. We should note that a sandboxed application might only be safely
>> used in some cases (e.g. pipe stdio and close other FDs), but I agree that
>> this is not satisfactory for now, and there are still gaps.
> 
> OK, I'll make sure it shows up in the documentation.
> 
> 
>>> Remark: For the specific TIOCSTI problem, I'm seeing a glimmer of
>>> light with this patch set which has appeared in the meantime:
>>> https://lore.kernel.org/all/20230710002645.v565c7xq5iddruse@begin/
>>> (This will still require that distributions flip that Kconfig option
>>> off, but the only(?) known user of TIOCSTI, BRLTTY, would continue
>>> working.)
>>
>> I hope most distros are disabling CONFIG_LEGACY_TIOCSTI, otherwise users
>> should still be able to tweak the related sysctl.
> 
> +1
> 
> 
>>> I would be more comfortable with doing the checks only at open(2) time
>>> if the above patch landed in distributions so that you would need to
>>> have CAP_SYS_ADMIN in order to use TIOCSTI.
>>>
>>> Do you think this is realistic?  If this does not get flipped by
>>> distributions, Landlock would continue to have these TIOCSTI problems
>>> on these platforms (unless its users do the pty/tty pair thing, but
>>> that seems like an unrealistic demand).
>>
>> What if we use an FD to identify an inode with landlock_inode_attr rule? We
>> could have some flag to "pin" the restriction on this specific FD/inode, or
>> only match the device type, or the file type… We need to think a bit more
>> about the implications though.
> 
> This would also be a later step and not be part of step (1).
> 
> The idea of using an FD as an "example" is interesting, but I'm not
> fully sold on it yet.  I need to ponder it more.  Some specific
> points:
> 
> * Creating such FDs might have unwanted side effects, or be
>    disproportionally difficult, when they are just created for the
>    purpose of defining a Landlock ruleset.
> 
> * The matching is unclear to me.  In particular, we've discussed
>    before to restrict dev_t ranges (fixed major, range on minor) - I
>    don't know how that would be done with this approach.

Those are valid concerns, but yes, let's get back to this discussion 
after the first step. :)


> 
> 
>>>   I don't see how
>>> specifying the file type and device ID range as plain numbers could
>>> lead to a race condition.
>>
>> No race condition but side channel issues. For instance, a landlocked
>> process could infer some file properties by adding and testing a Landlock
>> rule even if it is not allowed to read such file properties (because of
>> potential Landlock or other restrictions). Using a FD enables Landlock to
>> check that the process is indeed allowed to read such properties, or we may
>> decide that it is not necessary to do so.
> 
> Understood - so IIUC the scenario is that a process is not permitted
> to read file attributes, but it'll be able to infer the device ID by
> defining a dev_t-based Landlock rule and then observing whether ioctl
> still works.

Right. I think it should be possible to still check if this kind of file 
attribute would be allowed to be read by the process, when performing 
the IOCTL on it. We need to think about the implications, and if it's 
worth it.

It would be interesting to see if there are similar cover channels with 
other Linux interfaces.


> 
> (I'll also postpone it to step (2) or later then)
> 
> 
>>> If yes, I do agree that a list of permitted ioctls is similar to the
>>> access rights flags that we already have, and it would have to get
>>> passed around in a similar fashion (as "synthetic access rights"),
>>> albeit using a different data structure.
>>>
>>> I'm still skeptical of the API approach where we tie previously
>>> unrelated rules together, if that is what you mean here.  I find this
>>> difficult to explain and reason about.  But in doubt we'll see in the
>>> implementation how unwieldy it actually gets.
>>
>> Right, we don't need to implement the synthetic access rights with the
>> current step though.
> 
> +1 OK
> 
>>
>>>
>>>
>>>>> The upside of that approach would be that it could also be used to selectively
>>>>> restrict specific known-evil ioctls, and letting all others continue to work.
>>>>> For example, sandboxing or sudo-like programs could filter out TIOCSTI and
>>>>> TIOCLINUX.
>>>
>>> By the way, selectively restricting known-bad ioctls is still not
>>> possible with the approach we discussed now, I think.  Maybe TIOCSTI
>>> is the only bad one... I hope.
>>
>> It would not be possible with the landlock_path_beneath_attr, but the
>> landlock_inode_attr could be enough.
> 
> Will ponder it -- it has the limitation as I said above that it can't
> restrict device ranges, but it could be used to apply restrictions to
> specific opened inodes.
> 
> One difference I see with this approach is that the rights would not
> transfer along with the opened file when the files get passed between
> processes.  So the policy for that inode would apply to the enforcing
> process and its new children, but it would not apply to other
> processes which the file is given to.

Good point. We need to come up with something consistent.

> 
> 
> ---
> 
> Summarizing this, I believe that the parts that we need for step (1)
> are the following ones:
> 
> (1) Identify and blanket-permit a small list of ioctl cmds which work
>      on all file descriptors (FIOCLEX, FIONCLEX, FIONBIO, FIOASYNC, and
>      others?)
> 
>      Compare
>      https://lore.kernel.org/linux-security-module/6dfc0198-9010-7c54-2699-d3b867249850@digikod.net/
> 
> (2) Split into LANDLOCK_ACCESS_FS_IOCTL into a ..._DEV and a ..._NODEV part.

I'm a bit annoyed that this access rights mix action and object property 
and I'm worried that this might be a pain for future FS-related rule 
types (e.g. reusing all/most ACCESS_FS rights).

At first glance, a cleaner way would be to add a file_type field to 
landlock_path_beneath_attr (i.e. a subset of the landlock_inode_attr) 
but that would make struct landlock_rule and landlock_layer too complex, 
so not a good approach.

Unless someone has a better idea, let's stick to your first proposal and 
only implement LANDLOCK_ACCESS_FS_IOCTL (with the FIOCLEX-like 
exceptions). We should clearly explain that IOCTLs should be allowed for 
non-device file hierarchies: containing regular/data file (e.g. /home 
with the fscrypt use case), pipe, socket…

I'm still trying to convince myself which approach is the best, but for 
now the simplest one wins.

> 
> (3) Point out in documentation that this IOCTL restriction scheme only
>      applies to newly opened FDs and in particular point out the common
>      use case where that is a TTY, and what to do in this case.
> 
> If you agree, I'd go ahead and implement that as step (1) and we can
> discuss the more advanced ideas in the context of a follow-up.

Sounds good!

> 
> Thanks,
> —Günther
>
Günther Noack July 31, 2023, 8:31 a.m. UTC | #11
Hello!

On Fri, Jul 28, 2023 at 03:52:03PM +0200, Mickaël Salaün wrote:
> On 27/07/2023 11:32, Günther Noack wrote:
> > I'm puzzled how you come to the conclusion that devices don't do such
> > checks - did you read some ioctl command implementations, or is it a
> > more underlying principle that I was not aware of so far?
> 
> I took a look at fscrypt IOCTLs for instance, and other FS IOCTLs seems to
> correctly check for FD's read/write mode according to the IOCTL behavior.
> From what I've seen, IOCTLs implemented by device drivers don't care about
> file mode.

OK - That's surprising.


> > In any case - I believe the only reason why we are discussing this is
> > to justify the DEV/NODEV split, and that one in itself is not
> > controversial to me, even when I admittedly don't fully follow your
> > reasoning.
> 
> The main reason is than I don't want applications/users to not be allowed to
> use "legitimate" IOCTL, for instance to correctly encrypt their own files.
> If we only have one IOCTL right, we cannot easily differentiate between the
> targeted file types. However, this split might be too costly, cf. my comment
> in the below summary.

I believe that for "leaf processes" like most command line utilities, the
fscrypt use case is not a problem -- these IOCTLs are only needed for setting up
fscrypt directories and unlocking them.  Processes which merely access the files
in the unlocked directories can just transparently access them without using
ioctls.

The close-on-exec ioctls might be more relevant for leaf processes, but these
ones are uncontroversial to permit.

For "parent processes" like shells, where the exact operations done underneath
are not fully known in advance, it is more difficult to define an appropriate
policy using only the step (1) patch set, but I think it's a reasonable
trade-off for now.

When a policy distinguishes between /dev and other directories based on path,
that's already a good approximation for the ..._DEV and ..._NODEV access rights
which you proposed previously.  In addition to the nodev mount flag, mknod(2)
requires CAP_MKNOD, so it should not be possible for an attacker to place these
where they want (at least not through the Landlocked process, as we have the
NO_NEW_PRIVS flag).

Step (2) discussions though :)


> > Understood - so IIUC the scenario is that a process is not permitted
> > to read file attributes, but it'll be able to infer the device ID by
> > defining a dev_t-based Landlock rule and then observing whether ioctl
> > still works.
> 
> Right. I think it should be possible to still check if this kind of file
> attribute would be allowed to be read by the process, when performing the
> IOCTL on it. We need to think about the implications, and if it's worth it.
> 
> It would be interesting to see if there are similar cover channels with
> other Linux interfaces.

OK, noted it down for step (2).


> > Summarizing this, I believe that the parts that we need for step (1)
> > are the following ones:
> > 
> > (1) Identify and blanket-permit a small list of ioctl cmds which work
> >      on all file descriptors (FIOCLEX, FIONCLEX, FIONBIO, FIOASYNC, and
> >      others?)
> > 
> >      Compare
> >      https://lore.kernel.org/linux-security-module/6dfc0198-9010-7c54-2699-d3b867249850@digikod.net/
> > 
> > (2) Split into LANDLOCK_ACCESS_FS_IOCTL into a ..._DEV and a ..._NODEV part.
> 
> I'm a bit annoyed that this access rights mix action and object property and
> I'm worried that this might be a pain for future FS-related rule types (e.g.
> reusing all/most ACCESS_FS rights).
> 
> At first glance, a cleaner way would be to add a file_type field to
> landlock_path_beneath_attr (i.e. a subset of the landlock_inode_attr) but
> that would make struct landlock_rule and landlock_layer too complex, so not
> a good approach.
> 
> Unless someone has a better idea, let's stick to your first proposal and
> only implement LANDLOCK_ACCESS_FS_IOCTL (with the FIOCLEX-like exceptions).
> We should clearly explain that IOCTLs should be allowed for non-device file
> hierarchies: containing regular/data file (e.g. /home with the fscrypt use
> case), pipe, socket…
> 
> I'm still trying to convince myself which approach is the best, but for now
> the simplest one wins.

OK, sounds good.  I'll go for the LANDLOCK_ACCESS_FS_IOCTL approach then.


> > (3) Point out in documentation that this IOCTL restriction scheme only
> >      applies to newly opened FDs and in particular point out the common
> >      use case where that is a TTY, and what to do in this case.
> > 
> > If you agree, I'd go ahead and implement that as step (1) and we can
> > discuss the more advanced ideas in the context of a follow-up.

—Günther