Message ID | 20240424105248.189032-3-stsp2@yandex.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | implement OA2_INHERIT_CRED flag for openat2() | expand |
On Wed, Apr 24, 2024 at 01:52:48PM +0300, Stas Sergeev wrote: > @@ -3793,8 +3828,23 @@ static struct file *path_openat(struct nameidata *nd, > error = do_o_path(nd, flags, file); > } else { > const char *s = path_init(nd, flags); > - file = alloc_empty_file(open_flags, current_cred()); > - error = PTR_ERR_OR_ZERO(file); > + const struct cred *old_cred = NULL; > + > + error = 0; > + if (open_flags & OA2_INHERIT_CRED) { > + /* Only work with O_CLOEXEC dirs. */ > + if (!get_close_on_exec(nd->dfd)) > + error = -EPERM; > + > + if (!error) > + old_cred = openat2_override_creds(nd); > + } > + if (!error) { > + file = alloc_empty_file(open_flags, current_cred()); Consider the following, currently absolutely harmless situation: * process is owned by luser:students. * descriptor 69 refers to root-opened root directory (O_RDONLY) What's the expected result of fcntl(69, F_SEFTD, O_CLOEXEC); opening "etc/shadow" with dirfd equal to 69 and your flag given subsequent read() from the resulting descriptor? At which point will the kernel say "go fuck yourself, I'm not letting you read that file", provided that attacker passes that new flag of yours? As a bonus question, how about opening it for _write_, seeing that this is an obvious instant roothole? Again, currently the setup that has a root-opened directory in descriptor table of a non-root process is safe. Incidentally, suppose you have the same process run with stdin opened (r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with dirfd being 0, pathname - "" and flags - O_RDWR. AFAICS, without an explicit opt-in by the original opener it's a non-starter, and TBH I doubt that even with such opt-in (FMODE_CRED, whatever) it would be a good idea - it gives too much. NAKed-by: Al Viro <viro@zeniv.linux.org.uk>
25.04.2024 05:31, Al Viro пишет: > Consider the following, currently absolutely harmless situation: > * process is owned by luser:students. > * descriptor 69 refers to root-opened root directory (O_RDONLY) > What's the expected result of > fcntl(69, F_SEFTD, O_CLOEXEC); > opening "etc/shadow" with dirfd equal to 69 and your flag given > subsequent read() from the resulting descriptor? > > At which point will the kernel say "go fuck yourself, I'm not letting you > read that file", provided that attacker passes that new flag of yours? > > As a bonus question, how about opening it for _write_, seeing that this > is an obvious instant roothole? > > Again, currently the setup that has a root-opened directory in descriptor > table of a non-root process is safe. > > Incidentally, suppose you have the same process run with stdin opened > (r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with > dirfd being 0, pathname - "" and flags - O_RDWR. Ok, F_SETFD, how simple. :( > AFAICS, without an explicit opt-in by the original opener it's > a non-starter, and TBH I doubt that even with such opt-in (FMODE_CRED, > whatever) it would be a good idea - it gives too much. Yes, which is why I am quite sceptical to this FMODE_CRED idea. Please note that my O_CLOEXEC check actually meant to check that exactly this process have opened the dir. It just didn't happen that way, as you pointed. Can I replace the O_CLOEXEC check with some explicit check that makes sure the fd was opened by exactly that process?
25.04.2024 05:31, Al Viro пишет: > Incidentally, suppose you have the same process run with stdin opened > (r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with > dirfd being 0, pathname - "" and flags - O_RDWR. I actually checked this with the test-case. It seems to return ENOENT: Breakpoint 1, openat2 (dirfd=0, pathname=0x7fffffffdbee "", how=0x7fffffffd5e0, size=24) at tst.c:13 13 return syscall(SYS_openat2, dirfd, pathname, how, size); (gdb) fin Run till exit from #0 openat2 (dirfd=0, pathname=0x7fffffffdbee "", how=0x7fffffffd5e0, size=24) at tst.c:13 0x000000000040167b in main (argc=3, argv=0x7fffffffd7b8) at tst.c:140 140 fd = openat2(0, efile, &how1, sizeof(how1)); Value returned is $1 = -1 (gdb) list 135 err = fcntl(0, F_SETFD, O_CLOEXEC); 136 if (err) { 137 perror("fcntl(F_SETFD)"); 138 return EXIT_FAILURE; 139 } 140 fd = openat2(0, efile, &how1, sizeof(how1)); 141 if (fd == -1) { 142 perror("openat2(1)"); 143 // return EXIT_FAILURE; 144 } else { (gdb) p errno $2 = 2 So it seems the creds can't be stolen from a non-dir fd, but I wonder why ENOENT is returned instead of ENOTDIR. Such ENOENT is not dicumented in a man page of openat2(), so I guess there is some problem here even w/o my patch. :)
Hello, kernel test robot noticed "BUG:KASAN:wild-memory-access_in_terminate_walk" on: commit: 97bb54b42b1d6150e9ae11a7bf7833ed9f8c471d ("[PATCH 2/2] openat2: add OA2_INHERIT_CRED flag") url: https://github.com/intel-lab-lkp/linux/commits/Stas-Sergeev/fs-reorganize-path_openat/20240424-185527 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 9d1ddab261f3e2af7c384dc02238784ce0cf9f98 patch link: https://lore.kernel.org/all/20240424105248.189032-3-stsp2@yandex.ru/ patch subject: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag in testcase: boot compiler: clang-17 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +---------------------------------------------------------------------------------------+------------+------------+ | | 831d3c6cc6 | 97bb54b42b | +---------------------------------------------------------------------------------------+------------+------------+ | BUG:KASAN:wild-memory-access_in_terminate_walk | 0 | 12 | | canonical_address#:#[##] | 0 | 12 | | RIP:terminate_walk | 0 | 12 | | Kernel_panic-not_syncing:Fatal_exception | 0 | 12 | +---------------------------------------------------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202404252107.3c18eed2-lkp@intel.com [ 2.555857][ T16] BUG: KASAN: wild-memory-access in terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.556181][ T16] Write of size 4 at addr aaaaaaaaaaaaaaaa by task kdevtmpfs/16 [ 2.556181][ T16] [ 2.556181][ T16] CPU: 0 PID: 16 Comm: kdevtmpfs Tainted: G T 6.9.0-rc5-00038-g97bb54b42b1d #1 c90cc2d91176f38ca16e85ead0a72934082854cd [ 2.556181][ T16] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 2.556181][ T16] Call Trace: [ 2.556181][ T16] <TASK> [ 2.556181][ T16] dump_stack_lvl (lib/dump_stack.c:116) [ 2.556181][ T16] print_report (mm/kasan/report.c:?) [ 2.556181][ T16] ? kasan_report (mm/kasan/report.c:214 mm/kasan/report.c:590) [ 2.556181][ T16] ? terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.556181][ T16] kasan_report (mm/kasan/report.c:603) [ 2.556181][ T16] ? terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.556181][ T16] kasan_check_range (mm/kasan/generic.c:?) [ 2.556181][ T16] terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.556181][ T16] path_lookupat (fs/namei.c:2515) [ 2.556181][ T16] filename_lookup (fs/namei.c:2526) [ 2.556181][ T16] kern_path (fs/namei.c:2634) [ 2.556181][ T16] init_mount (fs/init.c:22) [ 2.556181][ T16] devtmpfs_setup (drivers/base/devtmpfs.c:419) [ 2.556181][ T16] devtmpfsd (drivers/base/devtmpfs.c:436) [ 2.556181][ T16] kthread (kernel/kthread.c:390) [ 2.556181][ T16] ? vclkdev_alloc (drivers/base/devtmpfs.c:435) [ 2.556181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341) [ 2.556181][ T16] ret_from_fork (arch/x86/kernel/process.c:153) [ 2.556181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341) [ 2.556181][ T16] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 2.556181][ T16] </TASK> [ 2.556181][ T16] ================================================================== [ 2.556184][ T16] Disabling lock debugging due to kernel taint [ 2.556901][ T16] general protection fault, probably for non-canonical address 0xaaaaaaaaaaaaaaaa: 0000 [#1] KASAN PTI [ 2.558131][ T16] CPU: 0 PID: 16 Comm: kdevtmpfs Tainted: G B T 6.9.0-rc5-00038-g97bb54b42b1d #1 c90cc2d91176f38ca16e85ead0a72934082854cd [ 2.559653][ T16] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 2.560181][ T16] RIP: 0010:terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.560181][ T16] Code: 03 43 80 3c 2e 00 74 08 4c 89 ff e8 01 61 f4 ff 49 8b 1f 48 85 db 74 41 48 89 df be 04 00 00 00 e8 dc 61 f4 ff b8 ff ff ff ff <0f> c1 03 83 f8 01 75 25 43 80 3c 2e 00 74 08 4c 89 ff e8 d0 60 f4 All code ======== 0: 03 43 80 add -0x80(%rbx),%eax 3: 3c 2e cmp $0x2e,%al 5: 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) 9: 89 ff mov %edi,%edi b: e8 01 61 f4 ff call 0xfffffffffff46111 10: 49 8b 1f mov (%r15),%rbx 13: 48 85 db test %rbx,%rbx 16: 74 41 je 0x59 18: 48 89 df mov %rbx,%rdi 1b: be 04 00 00 00 mov $0x4,%esi 20: e8 dc 61 f4 ff call 0xfffffffffff46201 25: b8 ff ff ff ff mov $0xffffffff,%eax 2a:* 0f c1 03 xadd %eax,(%rbx) <-- trapping instruction 2d: 83 f8 01 cmp $0x1,%eax 30: 75 25 jne 0x57 32: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1) 37: 74 08 je 0x41 39: 4c 89 ff mov %r15,%rdi 3c: e8 .byte 0xe8 3d: d0 60 f4 shlb -0xc(%rax) Code starting with the faulting instruction =========================================== 0: 0f c1 03 xadd %eax,(%rbx) 3: 83 f8 01 cmp $0x1,%eax 6: 75 25 jne 0x2d 8: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1) d: 74 08 je 0x17 f: 4c 89 ff mov %r15,%rdi 12: e8 .byte 0xe8 13: d0 60 f4 shlb -0xc(%rax) [ 2.560181][ T16] RSP: 0000:ffffc9000010fc40 EFLAGS: 00010246 [ 2.560181][ T16] RAX: 00000000ffffffff RBX: aaaaaaaaaaaaaaaa RCX: ffffffff811e4a0f [ 2.560181][ T16] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff8792adc0 [ 2.560181][ T16] RBP: 0000000000000011 R08: ffffffff8792adc7 R09: 1ffffffff0f255b8 [ 2.560181][ T16] R10: dffffc0000000000 R11: fffffbfff0f255b9 R12: 1ffff92000021fc4 [ 2.560181][ T16] R13: dffffc0000000000 R14: 1ffff92000021fc1 R15: ffffc9000010fe08 [ 2.560181][ T16] FS: 0000000000000000(0000) GS:ffffffff878dc000(0000) knlGS:0000000000000000 [ 2.560181][ T16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.560181][ T16] CR2: ffff88843ffff000 CR3: 000000000789c000 CR4: 00000000000406f0 [ 2.560181][ T16] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2.560181][ T16] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2.560181][ T16] Call Trace: [ 2.560181][ T16] <TASK> [ 2.560181][ T16] ? __die_body (arch/x86/kernel/dumpstack.c:421) [ 2.560181][ T16] ? die_addr (arch/x86/kernel/dumpstack.c:?) [ 2.560181][ T16] ? exc_general_protection (arch/x86/kernel/traps.c:?) [ 2.560181][ T16] ? end_report (arch/x86/include/asm/current.h:49 mm/kasan/report.c:240) [ 2.560181][ T16] ? asm_exc_general_protection (arch/x86/include/asm/idtentry.h:617) [ 2.560181][ T16] ? add_taint (arch/x86/include/asm/bitops.h:60 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/panic.c:555) [ 2.560181][ T16] ? terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.560181][ T16] path_lookupat (fs/namei.c:2515) [ 2.560181][ T16] filename_lookup (fs/namei.c:2526) [ 2.560181][ T16] kern_path (fs/namei.c:2634) [ 2.560181][ T16] init_mount (fs/init.c:22) [ 2.560181][ T16] devtmpfs_setup (drivers/base/devtmpfs.c:419) [ 2.560181][ T16] devtmpfsd (drivers/base/devtmpfs.c:436) [ 2.560181][ T16] kthread (kernel/kthread.c:390) [ 2.560181][ T16] ? vclkdev_alloc (drivers/base/devtmpfs.c:435) [ 2.560181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341) [ 2.560181][ T16] ret_from_fork (arch/x86/kernel/process.c:153) [ 2.560181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341) [ 2.560181][ T16] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 2.560181][ T16] </TASK> [ 2.560181][ T16] Modules linked in: [ 2.560183][ T16] ---[ end trace 0000000000000000 ]--- [ 2.560820][ T16] RIP: 0010:terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.561462][ T16] Code: 03 43 80 3c 2e 00 74 08 4c 89 ff e8 01 61 f4 ff 49 8b 1f 48 85 db 74 41 48 89 df be 04 00 00 00 e8 dc 61 f4 ff b8 ff ff ff ff <0f> c1 03 83 f8 01 75 25 43 80 3c 2e 00 74 08 4c 89 ff e8 d0 60 f4 All code ======== 0: 03 43 80 add -0x80(%rbx),%eax 3: 3c 2e cmp $0x2e,%al 5: 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) 9: 89 ff mov %edi,%edi b: e8 01 61 f4 ff call 0xfffffffffff46111 10: 49 8b 1f mov (%r15),%rbx 13: 48 85 db test %rbx,%rbx 16: 74 41 je 0x59 18: 48 89 df mov %rbx,%rdi 1b: be 04 00 00 00 mov $0x4,%esi 20: e8 dc 61 f4 ff call 0xfffffffffff46201 25: b8 ff ff ff ff mov $0xffffffff,%eax 2a:* 0f c1 03 xadd %eax,(%rbx) <-- trapping instruction 2d: 83 f8 01 cmp $0x1,%eax 30: 75 25 jne 0x57 32: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1) 37: 74 08 je 0x41 39: 4c 89 ff mov %r15,%rdi 3c: e8 .byte 0xe8 3d: d0 60 f4 shlb -0xc(%rax) Code starting with the faulting instruction =========================================== 0: 0f c1 03 xadd %eax,(%rbx) 3: 83 f8 01 cmp $0x1,%eax 6: 75 25 jne 0x2d 8: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1) d: 74 08 je 0x17 f: 4c 89 ff mov %r15,%rdi 12: e8 .byte 0xe8 13: d0 60 f4 shlb -0xc(%rax) The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240425/202404252107.3c18eed2-lkp@intel.com
> struct open_flags { > - int open_flag; > + u64 open_flag; Btw, this change taken together with > +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED) is also ripe to causes subtle bugs and security issues. This new VALID_OPENAT2_FLAGS define bypasses BUILD_BUG_ON_MSG(upper_32_bits(VALID_OPEN_FLAGS), "struct open_flags doesn't yet handle flags > 32 bits"); in build_open_flags(). And right now lookup_open(), open_last_lookups(), and do_open() just do: int open_flag = op->open_flag; Because op->open_flag was 32bit that was fine. But now ->open_flag is 64bit which means we truncate the upper 32bit including OA2_INHERIT_CRED or any other new flag in the upper 32bits in those functions. So as soon as there's an additional check in e.g., do_open() for OA2_INHERIT_CRED or in any of the other helpers that's security relevant we're fscked because that flag is never seen and no bot will help us here. And it's super easy to miss during review...
25.04.2024 17:02, Christian Brauner пишет: >> struct open_flags { >> - int open_flag; >> + u64 open_flag; > Btw, this change taken together with All fixed in v5. I dropped u64 use. Other comments are addressed as well. Please let me know if I missed some. Thank you.
diff --git a/fs/internal.h b/fs/internal.h index 7ca738904e34..692b53b19aad 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -169,7 +169,7 @@ static inline void sb_end_ro_state_change(struct super_block *sb) * open.c */ struct open_flags { - int open_flag; + u64 open_flag; umode_t mode; int acc_mode; int intent; diff --git a/fs/namei.c b/fs/namei.c index 413eef134234..aeb9f504538e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -586,6 +586,9 @@ struct nameidata { int dfd; vfsuid_t dir_vfsuid; umode_t dir_mode; + kuid_t dir_open_fsuid; + kgid_t dir_open_fsgid; + struct group_info *dir_open_groups; } __randomize_layout; #define ND_ROOT_PRESET 1 @@ -695,6 +698,8 @@ static void terminate_walk(struct nameidata *nd) nd->depth = 0; nd->path.mnt = NULL; nd->path.dentry = NULL; + if (nd->dir_open_groups) + put_group_info(nd->dir_open_groups); } /* path_put is needed afterwards regardless of success or failure */ @@ -2414,6 +2419,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) get_fs_pwd(current->fs, &nd->path); nd->inode = nd->path.dentry->d_inode; } + nd->dir_open_fsuid = current_cred()->fsuid; + nd->dir_open_fsgid = current_cred()->fsgid; + nd->dir_open_groups = get_current_groups(); } else { /* Caller must check execute permissions on the starting path component */ struct fd f = fdget_raw(nd->dfd); @@ -2437,6 +2445,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags) path_get(&nd->path); nd->inode = nd->path.dentry->d_inode; } + nd->dir_open_fsuid = f.file->f_cred->fsuid; + nd->dir_open_fsgid = f.file->f_cred->fsgid; + nd->dir_open_groups = get_group_info( + f.file->f_cred->group_info); fdput(f); } @@ -3776,6 +3788,29 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file) return error; } +static const struct cred *openat2_override_creds(struct nameidata *nd) +{ + const struct cred *old_cred; + struct cred *override_cred; + + override_cred = prepare_creds(); + if (!override_cred) + return NULL; + + override_cred->fsuid = nd->dir_open_fsuid; + override_cred->fsgid = nd->dir_open_fsgid; + override_cred->group_info = nd->dir_open_groups; + + override_cred->non_rcu = 1; + + old_cred = override_creds(override_cred); + + /* override_cred() gets its own ref */ + put_cred(override_cred); + + return old_cred; +} + static struct file *path_openat(struct nameidata *nd, const struct open_flags *op, unsigned flags) { @@ -3793,8 +3828,23 @@ static struct file *path_openat(struct nameidata *nd, error = do_o_path(nd, flags, file); } else { const char *s = path_init(nd, flags); - file = alloc_empty_file(open_flags, current_cred()); - error = PTR_ERR_OR_ZERO(file); + const struct cred *old_cred = NULL; + + error = 0; + if (open_flags & OA2_INHERIT_CRED) { + /* Only work with O_CLOEXEC dirs. */ + if (!get_close_on_exec(nd->dfd)) + error = -EPERM; + + if (!error) + old_cred = openat2_override_creds(nd); + } + if (!error) { + file = alloc_empty_file(open_flags, current_cred()); + error = PTR_ERR_OR_ZERO(file); + } else { + file = ERR_PTR(error); + } if (!error) { while (!(error = link_path_walk(s, nd)) && (s = open_last_lookups(nd, file, op)) != NULL) @@ -3802,6 +3852,8 @@ static struct file *path_openat(struct nameidata *nd, } if (!error) error = do_open(nd, file, op); + if (old_cred) + revert_creds(old_cred); terminate_walk(nd); if (IS_ERR(file)) return file; diff --git a/fs/open.c b/fs/open.c index ee8460c83c77..c871ff8fc6e3 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1225,7 +1225,7 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) * values before calling build_open_flags(), but openat2(2) checks all * of its arguments. */ - if (flags & ~VALID_OPEN_FLAGS) + if (flags & ~VALID_OPENAT2_FLAGS) return -EINVAL; if (how->resolve & ~VALID_RESOLVE_FLAGS) return -EINVAL; @@ -1326,6 +1326,14 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) lookup_flags |= LOOKUP_CACHED; } + if (flags & OA2_INHERIT_CRED) { + /* Inherit creds only with scoped look-up modes. */ + if (!(lookup_flags & LOOKUP_IS_SCOPED)) + return -EPERM; + /* Reject /proc "magic" links if inheriting creds. */ + lookup_flags |= LOOKUP_NO_MAGICLINKS; + } + op->lookup_flags = lookup_flags; return 0; } diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index a332e79b3207..b71f8b162102 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -12,6 +12,8 @@ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED) + /* List of all valid flags for the how->resolve argument: */ #define VALID_RESOLVE_FLAGS \ (RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \ diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h index a5feb7604948..cdd676a10b62 100644 --- a/include/uapi/linux/openat2.h +++ b/include/uapi/linux/openat2.h @@ -40,4 +40,7 @@ struct open_how { return -EAGAIN if that's not possible. */ +/* openat2-specific flags go to upper 4 bytes. */ +#define OA2_INHERIT_CRED (1ULL << 32) + #endif /* _UAPI_LINUX_OPENAT2_H */
This flag performs the open operation with the fs credentials (fsuid, fsgid, group_info) that were in effect when dir_fd was opened. This allows the process to pre-open some directories and then change eUID (and all other UIDs/GIDs) to a less-privileged user, retaining the ability to open/create files within these directories. Design goal: The idea is to provide a very light-weight sandboxing, where the process, without the use of any heavy-weight techniques like chroot within namespaces, can restrict the access to the set of pre-opened directories. This patch is just a first step to such sandboxing. If things go well, in the future the same extension can be added to more syscalls. These should include at least unlinkat(), renameat2() and the not-yet-upstreamed setxattrat(). Security considerations: - Only the bare minimal set of credentials is overridden: fsuid, fsgid and group_info. The rest, for example capabilities, are not overridden to avoid unneeded security risks. - To avoid sandboxing escape, this patch makes sure the restricted lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT. - To avoid leaking creds across exec, this patch requires O_CLOEXEC flag on a directory. - Magic /proc symlinks are discarded, as suggested by Andy Lutomirski <luto@kernel.org> Use cases: Virtual machines that deal with untrusted code, can use that instead of a more heavy-weighted approaches. Currently the approach is being tested on a dosemu2 VM. Signed-off-by: Stas Sergeev <stsp2@yandex.ru> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> --- fs/internal.h | 2 +- fs/namei.c | 56 ++++++++++++++++++++++++++++++++++-- fs/open.c | 10 ++++++- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 3 ++ 5 files changed, 69 insertions(+), 4 deletions(-)