Message ID | 20231010092133.4093612-1-hi@alyssa.is (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | exec: allow executing block devices | expand |
On Tue, Oct 10, 2023 at 09:21:33AM +0000, Alyssa Ross wrote: > As far as I can tell, the S_ISREG() check is there to prevent > executing files where that would be nonsensical, like directories, > fifos, or sockets. But the semantics for executing a block device are > quite obvious — the block device acts just like a regular file. > > My use case is having a common VM image that takes a configurable > payload to run. The payload will always be a single ELF file. > > I could share the file with virtio-fs, or I could create a disk image > containing a filesystem containing the payload, but both of those add > unnecessary layers of indirection when all I need to do is share a > single executable blob with the VM. Sharing it as a block device is > the most natural thing to do, aside from the (arbitrary, as far as I > can tell) restriction on executing block devices. (The only slight > complexity is that I need to ensure that my payload size is rounded up > to a whole number of sectors, but that's trivial and fast in > comparison to e.g. generating a filesystem image.) > > Signed-off-by: Alyssa Ross <hi@alyssa.is> Hi, Thanks for the suggestion! I would prefer to not change this rather core behavior in the kernel for a few reasons, but it mostly revolves around both user and developer expectations and the resulting fragility. For users, this hasn't been possible in the past, so if we make it possible, what situations are suddenly exposed on systems that are trying to very carefully control their execution environments? For developers, this ends up exercising code areas that have never been tested, and could lead to unexpected conditions. For example, deny_write_access() is explicitly documented as "for regular files". Perhaps it accidentally works with block devices, but this would need much more careful examination, etc. And while looking at this from a design perspective, it looks like a layering violation: roughly speaking, the kernel execute files, from filesystems, from block devices. Bypassing layers tends to lead to troublesome bugs and other weird problems. I wonder, though, if you can already get what you need through other existing mechanisms that aren't too much more hassle? For example, what about having a tool that creates a memfd from a block device and executes that? The memfd code has been used in a lot of odd exec corner cases in the past... -Kees
Kees Cook <keescook@chromium.org> writes: > On Tue, Oct 10, 2023 at 09:21:33AM +0000, Alyssa Ross wrote: >> As far as I can tell, the S_ISREG() check is there to prevent >> executing files where that would be nonsensical, like directories, >> fifos, or sockets. But the semantics for executing a block device are >> quite obvious — the block device acts just like a regular file. >> >> My use case is having a common VM image that takes a configurable >> payload to run. The payload will always be a single ELF file. >> >> I could share the file with virtio-fs, or I could create a disk image >> containing a filesystem containing the payload, but both of those add >> unnecessary layers of indirection when all I need to do is share a >> single executable blob with the VM. Sharing it as a block device is >> the most natural thing to do, aside from the (arbitrary, as far as I >> can tell) restriction on executing block devices. (The only slight >> complexity is that I need to ensure that my payload size is rounded up >> to a whole number of sectors, but that's trivial and fast in >> comparison to e.g. generating a filesystem image.) >> >> Signed-off-by: Alyssa Ross <hi@alyssa.is> > > Hi, > > Thanks for the suggestion! I would prefer to not change this rather core > behavior in the kernel for a few reasons, but it mostly revolves around > both user and developer expectations and the resulting fragility. > > For users, this hasn't been possible in the past, so if we make it > possible, what situations are suddenly exposed on systems that are trying > to very carefully control their execution environments? I expect very few, considering it's still necessary to have root chmod the block device to make it executable. > For developers, this ends up exercising code areas that have never been > tested, and could lead to unexpected conditions. For example, > deny_write_access() is explicitly documented as "for regular files". > Perhaps it accidentally works with block devices, but this would need > much more careful examination, etc. > > And while looking at this from a design perspective, it looks like a > layering violation: roughly speaking, the kernel execute files, from > filesystems, from block devices. Bypassing layers tends to lead to > troublesome bugs and other weird problems. > > I wonder, though, if you can already get what you need through other > existing mechanisms that aren't too much more hassle? For example, > what about having a tool that creates a memfd from a block device and > executes that? The memfd code has been used in a lot of odd exec corner > cases in the past... Is it possible to have a file-backed memfd? Strange name if so!
On Wed, Oct 11, 2023 at 07:38:39AM +0000, Alyssa Ross wrote:
> Is it possible to have a file-backed memfd? Strange name if so!
Not that I'm aware, but a program could just read the ELF from the block
device and stick it in a memfd and execute the result.
Hello, kernel test robot noticed "kernel-selftests.exec.non-regular.fail" on: commit: f086dcc88a64a2022314af666bd15d64c6748d27 ("[PATCH] exec: allow executing block devices") url: https://github.com/intel-lab-lkp/linux/commits/Alyssa-Ross/exec-allow-executing-block-devices/20231010-172704 patch link: https://lore.kernel.org/all/20231010092133.4093612-1-hi@alyssa.is/ patch subject: [PATCH] exec: allow executing block devices in testcase: kernel-selftests version: kernel-selftests-x86_64-60acb023-1_20230329 with following parameters: group: group-01 compiler: gcc-12 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202310201132.ec34d76b-oliver.sang@intel.com # timeout set to 300 # selftests: exec: non-regular # TAP version 13 # 1..6 # # Starting 6 tests from 6 test cases. # # RUN file.S_IFLNK.exec_errno ... # # OK file.S_IFLNK.exec_errno # ok 1 file.S_IFLNK.exec_errno # # RUN file.S_IFDIR.exec_errno ... # # OK file.S_IFDIR.exec_errno # ok 2 file.S_IFDIR.exec_errno # # RUN file.S_IFBLK.exec_errno ... # # non-regular.c:166:exec_errno:Expected errno (6) == variant->expected (13) # # exec_errno: Test failed at step #4 # # FAIL file.S_IFBLK.exec_errno # not ok 3 file.S_IFBLK.exec_errno # # RUN file.S_IFCHR.exec_errno ... # # OK file.S_IFCHR.exec_errno # ok 4 file.S_IFCHR.exec_errno # # RUN file.S_IFIFO.exec_errno ... # # OK file.S_IFIFO.exec_errno # ok 5 file.S_IFIFO.exec_errno # # RUN sock.exec_errno ... # # OK sock.exec_errno # ok 6 sock.exec_errno # # FAILED: 5 / 6 tests passed. # # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0 not ok 5 selftests: exec: non-regular # exit=1 The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231020/202310201132.ec34d76b-oliver.sang@intel.com
diff --git a/fs/exec.c b/fs/exec.c index 6518e33ea813..e29a9f16da5f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -148,7 +148,8 @@ SYSCALL_DEFINE1(uselib, const char __user *, library) * and check again at the very end too. */ error = -EACCES; - if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) || + if (WARN_ON_ONCE((!S_ISREG(file_inode(file)->i_mode) && + !S_ISBLK(file_inode(file)->i_mode)) || path_noexec(&file->f_path))) goto exit; @@ -931,7 +932,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) * and check again at the very end too. */ err = -EACCES; - if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) || + if (WARN_ON_ONCE((!S_ISREG(file_inode(file)->i_mode) && + !S_ISBLK(file_inode(file)->i_mode)) || path_noexec(&file->f_path))) goto exit; diff --git a/fs/namei.c b/fs/namei.c index 567ee547492b..60c89321604a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3254,7 +3254,7 @@ static int may_open(struct mnt_idmap *idmap, const struct path *path, fallthrough; case S_IFIFO: case S_IFSOCK: - if (acc_mode & MAY_EXEC) + if ((inode->i_mode & S_IFMT) != S_IFBLK && (acc_mode & MAY_EXEC)) return -EACCES; flag &= ~O_TRUNC; break;
As far as I can tell, the S_ISREG() check is there to prevent executing files where that would be nonsensical, like directories, fifos, or sockets. But the semantics for executing a block device are quite obvious — the block device acts just like a regular file. My use case is having a common VM image that takes a configurable payload to run. The payload will always be a single ELF file. I could share the file with virtio-fs, or I could create a disk image containing a filesystem containing the payload, but both of those add unnecessary layers of indirection when all I need to do is share a single executable blob with the VM. Sharing it as a block device is the most natural thing to do, aside from the (arbitrary, as far as I can tell) restriction on executing block devices. (The only slight complexity is that I need to ensure that my payload size is rounded up to a whole number of sectors, but that's trivial and fast in comparison to e.g. generating a filesystem image.) Signed-off-by: Alyssa Ross <hi@alyssa.is> --- fs/exec.c | 6 ++++-- fs/namei.c | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) base-commit: 94f6f0550c625fab1f373bb86a6669b45e9748b3