diff mbox series

exec: allow executing block devices

Message ID 20231010092133.4093612-1-hi@alyssa.is (mailing list archive)
State New, archived
Headers show
Series exec: allow executing block devices | expand

Commit Message

Alyssa Ross Oct. 10, 2023, 9:21 a.m. UTC
As far as I can tell, the S_ISREG() check is there to prevent
executing files where that would be nonsensical, like directories,
fifos, or sockets.  But the semantics for executing a block device are
quite obvious — the block device acts just like a regular file.

My use case is having a common VM image that takes a configurable
payload to run.  The payload will always be a single ELF file.

I could share the file with virtio-fs, or I could create a disk image
containing a filesystem containing the payload, but both of those add
unnecessary layers of indirection when all I need to do is share a
single executable blob with the VM.  Sharing it as a block device is
the most natural thing to do, aside from the (arbitrary, as far as I
can tell) restriction on executing block devices.  (The only slight
complexity is that I need to ensure that my payload size is rounded up
to a whole number of sectors, but that's trivial and fast in
comparison to e.g. generating a filesystem image.)

Signed-off-by: Alyssa Ross <hi@alyssa.is>
---
 fs/exec.c  | 6 ++++--
 fs/namei.c | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)


base-commit: 94f6f0550c625fab1f373bb86a6669b45e9748b3

Comments

Kees Cook Oct. 10, 2023, 10:48 p.m. UTC | #1
On Tue, Oct 10, 2023 at 09:21:33AM +0000, Alyssa Ross wrote:
> As far as I can tell, the S_ISREG() check is there to prevent
> executing files where that would be nonsensical, like directories,
> fifos, or sockets.  But the semantics for executing a block device are
> quite obvious — the block device acts just like a regular file.
> 
> My use case is having a common VM image that takes a configurable
> payload to run.  The payload will always be a single ELF file.
> 
> I could share the file with virtio-fs, or I could create a disk image
> containing a filesystem containing the payload, but both of those add
> unnecessary layers of indirection when all I need to do is share a
> single executable blob with the VM.  Sharing it as a block device is
> the most natural thing to do, aside from the (arbitrary, as far as I
> can tell) restriction on executing block devices.  (The only slight
> complexity is that I need to ensure that my payload size is rounded up
> to a whole number of sectors, but that's trivial and fast in
> comparison to e.g. generating a filesystem image.)
> 
> Signed-off-by: Alyssa Ross <hi@alyssa.is>

Hi,

Thanks for the suggestion! I would prefer to not change this rather core
behavior in the kernel for a few reasons, but it mostly revolves around
both user and developer expectations and the resulting fragility.

For users, this hasn't been possible in the past, so if we make it
possible, what situations are suddenly exposed on systems that are trying
to very carefully control their execution environments?

For developers, this ends up exercising code areas that have never been
tested, and could lead to unexpected conditions. For example,
deny_write_access() is explicitly documented as "for regular files".
Perhaps it accidentally works with block devices, but this would need
much more careful examination, etc.

And while looking at this from a design perspective, it looks like a
layering violation: roughly speaking, the kernel execute files, from
filesystems, from block devices. Bypassing layers tends to lead to
troublesome bugs and other weird problems.

I wonder, though, if you can already get what you need through other
existing mechanisms that aren't too much more hassle? For example,
what about having a tool that creates a memfd from a block device and
executes that? The memfd code has been used in a lot of odd exec corner
cases in the past...

-Kees
Alyssa Ross Oct. 11, 2023, 7:38 a.m. UTC | #2
Kees Cook <keescook@chromium.org> writes:

> On Tue, Oct 10, 2023 at 09:21:33AM +0000, Alyssa Ross wrote:
>> As far as I can tell, the S_ISREG() check is there to prevent
>> executing files where that would be nonsensical, like directories,
>> fifos, or sockets.  But the semantics for executing a block device are
>> quite obvious — the block device acts just like a regular file.
>> 
>> My use case is having a common VM image that takes a configurable
>> payload to run.  The payload will always be a single ELF file.
>> 
>> I could share the file with virtio-fs, or I could create a disk image
>> containing a filesystem containing the payload, but both of those add
>> unnecessary layers of indirection when all I need to do is share a
>> single executable blob with the VM.  Sharing it as a block device is
>> the most natural thing to do, aside from the (arbitrary, as far as I
>> can tell) restriction on executing block devices.  (The only slight
>> complexity is that I need to ensure that my payload size is rounded up
>> to a whole number of sectors, but that's trivial and fast in
>> comparison to e.g. generating a filesystem image.)
>> 
>> Signed-off-by: Alyssa Ross <hi@alyssa.is>
>
> Hi,
>
> Thanks for the suggestion! I would prefer to not change this rather core
> behavior in the kernel for a few reasons, but it mostly revolves around
> both user and developer expectations and the resulting fragility.
>
> For users, this hasn't been possible in the past, so if we make it
> possible, what situations are suddenly exposed on systems that are trying
> to very carefully control their execution environments?

I expect very few, considering it's still necessary to have root chmod
the block device to make it executable.

> For developers, this ends up exercising code areas that have never been
> tested, and could lead to unexpected conditions. For example,
> deny_write_access() is explicitly documented as "for regular files".
> Perhaps it accidentally works with block devices, but this would need
> much more careful examination, etc.
>
> And while looking at this from a design perspective, it looks like a
> layering violation: roughly speaking, the kernel execute files, from
> filesystems, from block devices. Bypassing layers tends to lead to
> troublesome bugs and other weird problems.
>
> I wonder, though, if you can already get what you need through other
> existing mechanisms that aren't too much more hassle? For example,
> what about having a tool that creates a memfd from a block device and
> executes that? The memfd code has been used in a lot of odd exec corner
> cases in the past...

Is it possible to have a file-backed memfd?  Strange name if so!
Kees Cook Oct. 11, 2023, 3:59 p.m. UTC | #3
On Wed, Oct 11, 2023 at 07:38:39AM +0000, Alyssa Ross wrote:
> Is it possible to have a file-backed memfd?  Strange name if so! 

Not that I'm aware, but a program could just read the ELF from the block
device and stick it in a memfd and execute the result.
kernel test robot Oct. 20, 2023, 6:06 a.m. UTC | #4
Hello,

kernel test robot noticed "kernel-selftests.exec.non-regular.fail" on:

commit: f086dcc88a64a2022314af666bd15d64c6748d27 ("[PATCH] exec: allow executing block devices")
url: https://github.com/intel-lab-lkp/linux/commits/Alyssa-Ross/exec-allow-executing-block-devices/20231010-172704
patch link: https://lore.kernel.org/all/20231010092133.4093612-1-hi@alyssa.is/
patch subject: [PATCH] exec: allow executing block devices

in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:

	group: group-01



compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202310201132.ec34d76b-oliver.sang@intel.com



# timeout set to 300
# selftests: exec: non-regular
# TAP version 13
# 1..6
# # Starting 6 tests from 6 test cases.
# #  RUN           file.S_IFLNK.exec_errno ...
# #            OK  file.S_IFLNK.exec_errno
# ok 1 file.S_IFLNK.exec_errno
# #  RUN           file.S_IFDIR.exec_errno ...
# #            OK  file.S_IFDIR.exec_errno
# ok 2 file.S_IFDIR.exec_errno
# #  RUN           file.S_IFBLK.exec_errno ...
# # non-regular.c:166:exec_errno:Expected errno (6) == variant->expected (13)
# # exec_errno: Test failed at step #4
# #          FAIL  file.S_IFBLK.exec_errno
# not ok 3 file.S_IFBLK.exec_errno
# #  RUN           file.S_IFCHR.exec_errno ...
# #            OK  file.S_IFCHR.exec_errno
# ok 4 file.S_IFCHR.exec_errno
# #  RUN           file.S_IFIFO.exec_errno ...
# #            OK  file.S_IFIFO.exec_errno
# ok 5 file.S_IFIFO.exec_errno
# #  RUN           sock.exec_errno ...
# #            OK  sock.exec_errno
# ok 6 sock.exec_errno
# # FAILED: 5 / 6 tests passed.
# # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
not ok 5 selftests: exec: non-regular # exit=1



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310201132.ec34d76b-oliver.sang@intel.com
diff mbox series

Patch

diff --git a/fs/exec.c b/fs/exec.c
index 6518e33ea813..e29a9f16da5f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -148,7 +148,8 @@  SYSCALL_DEFINE1(uselib, const char __user *, library)
 	 * and check again at the very end too.
 	 */
 	error = -EACCES;
-	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) ||
+	if (WARN_ON_ONCE((!S_ISREG(file_inode(file)->i_mode) &&
+			  !S_ISBLK(file_inode(file)->i_mode)) ||
 			 path_noexec(&file->f_path)))
 		goto exit;
 
@@ -931,7 +932,8 @@  static struct file *do_open_execat(int fd, struct filename *name, int flags)
 	 * and check again at the very end too.
 	 */
 	err = -EACCES;
-	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) ||
+	if (WARN_ON_ONCE((!S_ISREG(file_inode(file)->i_mode) &&
+			  !S_ISBLK(file_inode(file)->i_mode)) ||
 			 path_noexec(&file->f_path)))
 		goto exit;
 
diff --git a/fs/namei.c b/fs/namei.c
index 567ee547492b..60c89321604a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3254,7 +3254,7 @@  static int may_open(struct mnt_idmap *idmap, const struct path *path,
 		fallthrough;
 	case S_IFIFO:
 	case S_IFSOCK:
-		if (acc_mode & MAY_EXEC)
+		if ((inode->i_mode & S_IFMT) != S_IFBLK && (acc_mode & MAY_EXEC))
 			return -EACCES;
 		flag &= ~O_TRUNC;
 		break;