Message ID | 87eerszyim.fsf_-_@x220.int.ebiederm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | exec: Control flow simplifications | expand |
On Sat, May 9, 2020 at 12:45 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Instead of recursing in search_binary_handler have the methods that > would recurse return a positive value, and simply loop in exec_binprm. > > This is a trivial change as all of the methods that would recurse do > so as effectively the last thing they do. Making this a trivial code > change. Looks good. I'd suggest doing that loop slightly differently: > - ret = search_binary_handler(bprm); > + do { > + depth++; > + ret = search_binary_handler(bprm); > + /* This allows 4 levels of binfmt rewrites before failing hard. */ > + if ((ret > 0) && (depth > 5)) > + ret = -ELOOP; > + } while (ret > 0); > if (ret >= 0) { That's really an odd way to write this. So honestly, if "ret < 0", then we can just return directly. So I think would make much more sense to do this loop something like for (depth = 0; depth < 5; depth++) { int ret; ret = search_binary_handler(bprm); if (ret < 0) return ret; /* Continue searching for the next binary handler? */ if (ret > 0) continue; /* Success! */ audit_bprm(bprm); trace_sched_process_exec(current, old_pid, bprm); ptrace_event(PTRACE_EVENT_EXEC, old_vpid); proc_exec_connector(current); return 0; } return -ELOOP; (if I read the logic of exec_binprm() right - I might have missed something). Linus
On 2020/05/10 4:41, Eric W. Biederman wrote: > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -234,10 +234,7 @@ static int load_misc_binary(struct linux_binprm *bprm) > if (retval < 0) > goto error; > > - retval = search_binary_handler(bprm); > - if (retval < 0) > - goto error; > - > + retval = 1; /* Search for the interpreter */ > ret: > dput(fmt->dentry); > return retval; Wouldn't this change cause if (fd_binary > 0) ksys_close(fd_binary); bprm->interp_flags = 0; bprm->interp_data = 0; not to be called when "Search for the interpreter" failed?
On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > Wouldn't this change cause > > if (fd_binary > 0) > ksys_close(fd_binary); > bprm->interp_flags = 0; > bprm->interp_data = 0; > > not to be called when "Search for the interpreter" failed? Good catch. We seem to have some subtle magic wrt the fd_binary file descriptor, which depends on the recursive behavior. I'm not seeing how to fix it cleanly with the "turn it into a loop". Basically, that binfmt_misc use-case isn't really a tail-call. Eric, ideas? Linus
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> Wouldn't this change cause >> >> if (fd_binary > 0) >> ksys_close(fd_binary); >> bprm->interp_flags = 0; >> bprm->interp_data = 0; >> >> not to be called when "Search for the interpreter" failed? > > Good catch. We seem to have some subtle magic wrt the fd_binary file > descriptor, which depends on the recursive behavior. Yes. I Tetsuo I really appreciate you noticing this. This is exactly the kind of behavior I am trying to flush out and keep from being hidden. > I'm not seeing how to fix it cleanly with the "turn it into a loop". > Basically, that binfmt_misc use-case isn't really a tail-call. I have reservations about installing a new file descriptor before we process the close on exec logic and the related security modules closing file descriptors that your new credentials no longer give you access to logic. I haven't yet figured out how opening a file descriptor during exec should fit into all of that. What I do see is that interp_data is just a parameter that is smuggled into the call of search binary handler. And the next binary handler needs to be binfmt_elf for it to make much sense, as only binfmt_elf (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD. So I think what needs to happen is to rename bprm->interp_data to bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file descriptor free_bprm's responsiblity. I hope such a change will make it easier to see all of the pieces that are intereacting during exec. I am still asking: is the installation of that file descriptor useful if it is not exported passed to userspace as an AT_EXECFD note? I will dig in and see what I can come up with. Eric
On 5/11/20 9:33 AM, Eric W. Biederman wrote: > What I do see is that interp_data is just a parameter that is smuggled > into the call of search binary handler. And the next binary handler > needs to be binfmt_elf for it to make much sense, as only binfmt_elf > (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD. The binfmt_elf_fdpic driver is separate from binfmt_elf for the same reason ext2/ext3/ext4 used to have 3 drivers: fdpic is really just binfmt_elf with the 4 main sections (text, data, bss, rodata) able to move independently of each other (each tracked with its own base pointer). It's kind of -fPIE on steroids, and various security people have sniffed at it over the years to give ASLR more degrees of freedom on with-MMU systems. Many moons ago Rich Felker proposed teaching the fdpic loader how to load normal ELF binaries so there's just the one loader (there's a flag in the ELF header to say whether the sections are independent or not). Rob
On Mon, May 11, 2020 at 09:33:21AM -0500, Eric W. Biederman wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > > > On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > >> > >> Wouldn't this change cause > >> > >> if (fd_binary > 0) > >> ksys_close(fd_binary); > >> bprm->interp_flags = 0; > >> bprm->interp_data = 0; > >> > >> not to be called when "Search for the interpreter" failed? > > > > Good catch. We seem to have some subtle magic wrt the fd_binary file > > descriptor, which depends on the recursive behavior. > > Yes. I Tetsuo I really appreciate you noticing this. This is exactly > the kind of behavior I am trying to flush out and keep from being > hidden. > > > I'm not seeing how to fix it cleanly with the "turn it into a loop". > > Basically, that binfmt_misc use-case isn't really a tail-call. > > I have reservations about installing a new file descriptor before > we process the close on exec logic and the related security modules > closing file descriptors that your new credentials no longer give > you access to logic. Hm, this does feel odd. In looking at this, it seems like this file never gets close-on-exec set, and doesn't have its flags changed from its original open: .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, only the UMH path through exec doesn't explicitly open a file by name from what I can see, so we'll only have these flags. > I haven't yet figured out how opening a file descriptor during exec > should fit into all of that. > > What I do see is that interp_data is just a parameter that is smuggled > into the call of search binary handler. And the next binary handler > needs to be binfmt_elf for it to make much sense, as only binfmt_elf > (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD. > > So I think what needs to happen is to rename bprm->interp_data to > bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file > descriptor free_bprm's responsiblity. Yeah, I would agree. As far as the close handling, I don't think there is a difference here: it interp_data was closed on the binfmt_misc.c error path, and in the new world it would be the exec error path -- both would be under the original credentials. > I hope such a change will make it easier to see all of the pieces that > are intereacting during exec. Right -- I'm not sure which piece should "consume" bprm->execfd though, which I think is what you're asking next... > I am still asking: is the installation of that file descriptor useful if > it is not exported passed to userspace as an AT_EXECFD note? > > I will dig in and see what I can come up with. Should binfmt_misc do the install, or can the consuming binfmt do it? i.e. when binfmt_elf sees bprm->execfd, does it perform the install instead?
Kees Cook <keescook@chromium.org> writes: > On Mon, May 11, 2020 at 09:33:21AM -0500, Eric W. Biederman wrote: >> Linus Torvalds <torvalds@linux-foundation.org> writes: >> >> > On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa >> > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> >> >> Wouldn't this change cause >> >> >> >> if (fd_binary > 0) >> >> ksys_close(fd_binary); >> >> bprm->interp_flags = 0; >> >> bprm->interp_data = 0; >> >> >> >> not to be called when "Search for the interpreter" failed? >> > >> > Good catch. We seem to have some subtle magic wrt the fd_binary file >> > descriptor, which depends on the recursive behavior. >> >> Yes. I Tetsuo I really appreciate you noticing this. This is exactly >> the kind of behavior I am trying to flush out and keep from being >> hidden. >> >> > I'm not seeing how to fix it cleanly with the "turn it into a loop". >> > Basically, that binfmt_misc use-case isn't really a tail-call. >> >> I have reservations about installing a new file descriptor before >> we process the close on exec logic and the related security modules >> closing file descriptors that your new credentials no longer give >> you access to logic. > > Hm, this does feel odd. In looking at this, it seems like this file > never gets close-on-exec set, and doesn't have its flags changed from > its original open: > .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, > only the UMH path through exec doesn't explicitly open a file by name > from what I can see, so we'll only have these flags. > >> I haven't yet figured out how opening a file descriptor during exec >> should fit into all of that. >> >> What I do see is that interp_data is just a parameter that is smuggled >> into the call of search binary handler. And the next binary handler >> needs to be binfmt_elf for it to make much sense, as only binfmt_elf >> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD. >> >> So I think what needs to happen is to rename bprm->interp_data to >> bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file >> descriptor free_bprm's responsiblity. > > Yeah, I would agree. As far as the close handling, I don't think there > is a difference here: it interp_data was closed on the binfmt_misc.c > error path, and in the new world it would be the exec error path -- both > would be under the original credentials. > >> I hope such a change will make it easier to see all of the pieces that >> are intereacting during exec. > > Right -- I'm not sure which piece should "consume" bprm->execfd though, > which I think is what you're asking next... > >> I am still asking: is the installation of that file descriptor useful if >> it is not exported passed to userspace as an AT_EXECFD note? >> >> I will dig in and see what I can come up with. > > Should binfmt_misc do the install, or can the consuming binfmt do it? > i.e. when binfmt_elf sees bprm->execfd, does it perform the install > instead? I am still thinking about this one, but here is where I am at. At a practical level passing the file descriptor of the script to interpreter seems like something we should encourage in the long term. It removes races and it is cheaper because then the interpreter does not have to turn around and open the script itself. Strictly speaking binfmt_misc should not need to close the file descriptor in binfmt_misc because we have already unshared the files struct and reset_files_struct should handle restoring it. Calling fd_install in binfmt_misc still seems wrong, as that exposes the new file descriptor to user space with the old creds. It is possible although unlikely for userspace to find the file descriptor without consulting AT_EXECFD so just to be conservative I think we should install the file descriptor in begin_new_exec even if the next interpreter does not support AT_EXECFD. I am still working on how to handle recursive binfmts but I suspect it is just a matter of having an array of struct files in struct linux_binprm. Eric
On Tue, May 12, 2020 at 01:42:53PM -0500, Eric W. Biederman wrote: > Kees Cook <keescook@chromium.org> writes: > > Should binfmt_misc do the install, or can the consuming binfmt do it? > > i.e. when binfmt_elf sees bprm->execfd, does it perform the install > > instead? > > I am still thinking about this one, but here is where I am at. At a > practical level passing the file descriptor of the script to interpreter > seems like something we should encourage in the long term. It removes > races and it is cheaper because then the interpreter does not have to > turn around and open the script itself. Yeah, this does sounds pretty good, though I have concerns about doing it for a process that isn't expecting it. I've seen a lot of bad code make assumptions about initial fd numbers. :( > Strictly speaking binfmt_misc should not need to close the file > descriptor in binfmt_misc because we have already unshared the files > struct and reset_files_struct should handle restoring it. If I get what you mean, I agree. The error case is fine. > Calling fd_install in binfmt_misc still seems wrong, as that exposes > the new file descriptor to user space with the old creds. I haven't dug into the details here -- is there a real risk here? The old creds are what opened the file originally for the exec. Are you thinking about executable-but-not-readable files? > It is possible although unlikely for userspace to find the file > descriptor without consulting AT_EXECFD so just to be conservative I > think we should install the file descriptor in begin_new_exec even if > the next interpreter does not support AT_EXECFD. I think universally installing the fd needs to be a distinct patch -- it's going to have a lot of consequences, IMO. We can certainly deal with them, but I don't think it should be part of this clean-up series. > I am still working on how to handle recursive binfmts but I suspect it > is just a matter of having an array of struct files in struct > linux_binprm. If install is left if binfmt_misc, then the recursive problem goes away, yes?
Kees Cook <keescook@chromium.org> writes: > On Tue, May 12, 2020 at 01:42:53PM -0500, Eric W. Biederman wrote: >> Kees Cook <keescook@chromium.org> writes: >> > Should binfmt_misc do the install, or can the consuming binfmt do it? >> > i.e. when binfmt_elf sees bprm->execfd, does it perform the install >> > instead? >> >> I am still thinking about this one, but here is where I am at. At a >> practical level passing the file descriptor of the script to interpreter >> seems like something we should encourage in the long term. It removes >> races and it is cheaper because then the interpreter does not have to >> turn around and open the script itself. > > Yeah, this does sounds pretty good, though I have concerns about doing > it for a process that isn't expecting it. I've seen a lot of bad code > make assumptions about initial fd numbers. :( Yes. That is definitely a concern. >> Strictly speaking binfmt_misc should not need to close the file >> descriptor in binfmt_misc because we have already unshared the files >> struct and reset_files_struct should handle restoring it. > > If I get what you mean, I agree. The error case is fine. > >> Calling fd_install in binfmt_misc still seems wrong, as that exposes >> the new file descriptor to user space with the old creds. > > I haven't dug into the details here -- is there a real risk here? The > old creds are what opened the file originally for the exec. Are you > thinking about executable-but-not-readable files? I am thinking about looking in proc/<pid>/fd and maybe opening those files. That access is gated by ptrace_may_access which is gated by the process credentials. So I know strictly speaking it is wrong. I think you are correct that it would only allow access to a file that could be accessed another way. Even execveat at a quick glance appears to go through the orinary permission checks of open. The current code is definitely a maintenance pitfall as it install state into the process early. >> It is possible although unlikely for userspace to find the file >> descriptor without consulting AT_EXECFD so just to be conservative I >> think we should install the file descriptor in begin_new_exec even if >> the next interpreter does not support AT_EXECFD. > > I think universally installing the fd needs to be a distinct patch -- > it's going to have a lot of consequences, IMO. We can certainly deal > with them, but I don't think it should be part of this clean-up series. I meant generically installing the fd not universally installing it. >> I am still working on how to handle recursive binfmts but I suspect it >> is just a matter of having an array of struct files in struct >> linux_binprm. > > If install is left if binfmt_misc, then the recursive problem goes away, > yes? I don't think leaving the install in binfmt_misc is responsible at this point. Eric
On Tue, May 12, 2020 at 03:31:57PM -0500, Eric W. Biederman wrote: > >> It is possible although unlikely for userspace to find the file > >> descriptor without consulting AT_EXECFD so just to be conservative I > >> think we should install the file descriptor in begin_new_exec even if > >> the next interpreter does not support AT_EXECFD. > > > > I think universally installing the fd needs to be a distinct patch -- > > it's going to have a lot of consequences, IMO. We can certainly deal > > with them, but I don't think it should be part of this clean-up series. > > I meant generically installing the fd not universally installing it. > > >> I am still working on how to handle recursive binfmts but I suspect it > >> is just a matter of having an array of struct files in struct > >> linux_binprm. > > > > If install is left if binfmt_misc, then the recursive problem goes away, > > yes? > > I don't think leaving the install in binfmt_misc is responsible at this > point. I'm nearly certain the answer is "yes", but I wonder if we should stop for a moment and ask "does anything still use MISC_FMT_OPEN_BINARY ? It looks like either "O" or "C" binfmt_misc registration flag. My installed binfmts on Ubuntu don't use them... I'm currently pulling a list of all the packages in Debian than depend on the binfmt-support package and checking their flags.
On Tue, May 12, 2020 at 04:08:56PM -0700, Kees Cook wrote: > I'm nearly certain the answer is "yes", but I wonder if we should stop > for a moment and ask "does anything still use MISC_FMT_OPEN_BINARY ? It > looks like either "O" or "C" binfmt_misc registration flag. My installed > binfmts on Ubuntu don't use them... > > I'm currently pulling a list of all the packages in Debian than depend > on the binfmt-support package and checking their flags. So, binfmt-support in Debian doesn't in _support_ MISC_FMT_OPEN_BINARY ("O"): credentials = (binfmt->credentials && !strcmp (binfmt->credentials, "yes")) ? "C" : ""; preserve = (binfmt->preserve && !strcmp (binfmt->preserve, "yes")) ? "P" : ""; fix_binary = (binfmt->fix_binary && !strcmp (binfmt->fix_binary, "yes")) ? "F" : ""; ... regstring = xasprintf (":%s:%c:%s:%s:%s:%s:%s%s%s\n", name, type, binfmt->offset, binfmt->magic, binfmt->mask, interpreter, credentials, preserve, fix_binary); However, "credentials" ("C") does imply MISC_FMT_OPEN_BINARY. I looked at every Debian package using binfmt-support, and "only" qemu uses "credential". And now I wonder if qemu actually uses the resulting AT_EXECFD ...
On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote:
> And now I wonder if qemu actually uses the resulting AT_EXECFD ...
It does, though I'm not sure if this is to support crossing mount points,
dropping privileges, or something else, since it does fall back to just
trying to open the file.
execfd = qemu_getauxval(AT_EXECFD);
if (execfd == 0) {
execfd = open(filename, O_RDONLY);
if (execfd < 0) {
printf("Error while loading %s: %s\n", filename, strerror(errno));
_exit(EXIT_FAILURE);
}
}
On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > I am still thinking about this one, but here is where I am at. At a > practical level passing the file descriptor of the script to interpreter > seems like something we should encourage in the long term. It removes > races and it is cheaper because then the interpreter does not have to > turn around and open the script itself. Yeah, I think we should continue to support it, because I think it's the right thing to do (and we might just end up having compatibility issues if we don't). How about trying to move the logic to the common code, out of binfmt_misc? IOW, how about something very similar to your "brpm->preserve_creds" thing that you did for the credentials (also for binfmt_misc, which shouldn't surprise anybody: binfmt_misc is simply the "this is the generic thing for letting user mode do the final details"). > Calling fd_install in binfmt_misc still seems wrong, as that exposes > the new file descriptor to user space with the old creds. Right. And it really would be good to simply not have these kinds of very special cases inside the low-level binfmt code: I'd much rather have the special cases in the generic code, so that we see what the ordering is etc. One of the big problems with all these binfmt callbacks has been the fact that it makes it so hard to think about and change the generic code, because the low-level binfmt handlers all do their own special thing. So moving it to generic code would likely simplify things from that angle, even if the actual complexity of the feature itself remains. Besides, we really have exposed this to other code anyway thanks to that whole bprm->interp_data thing, and the AT_EXECFD AUX entries that we have. So it's not really "internal" to binfmt_misc _anyway_. So how about we just move the fd_binary logic to the generic execve code, and just binfmt_misc set the flag for "yes, please do this", exactly like "preserve_creds"? > It is possible although unlikely for userspace to find the file > descriptor without consulting AT_EXECFD so just to be conservative I > think we should install the file descriptor in begin_new_exec even if > the next interpreter does not support AT_EXECFD. Ack. I think the AT_EXECFD thing is a sign that this isn't internal to binfmt_misc, but it also shouldn't be gating this issue. In reality, ELF is the only real binary format that matters - the script/misc binfmts are just indirection entries - and it supports AT_EXECFD, so let's just ignore the theoretical case of "maybe nobody exposes it". So yes, just make it part of begin_new_exec(), and there's no reason to support more than a single fd. No stacks or arrays of these things required, I feel. It's not like AT_EXECFD supports the notion of multiple fd's being reported anyway, nor does it make any sense to have some kind of nested misc->misc binfmt nesting. So making that whole interp_data and fd_binary thing be a generic layer thing would make the search_binary_handler() code in binfmt_misc be a pure tailcall too, and then the conversion to a loop ends up working and being the right thing. No? Linus
On 5/12/20 7:20 PM, Linus Torvalds wrote: > On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> I am still thinking about this one, but here is where I am at. At a >> practical level passing the file descriptor of the script to interpreter >> seems like something we should encourage in the long term. It removes >> races and it is cheaper because then the interpreter does not have to >> turn around and open the script itself. > > Yeah, I think we should continue to support it, because I think it's > the right thing to do (and we might just end up having compatibility > issues if we don't). ... >> It is possible although unlikely for userspace to find the file >> descriptor without consulting AT_EXECFD so just to be conservative I >> think we should install the file descriptor in begin_new_exec even if >> the next interpreter does not support AT_EXECFD. > > Ack. I think the AT_EXECFD thing is a sign that this isn't internal to > binfmt_misc, but it also shouldn't be gating this issue. In reality, > ELF is the only real binary format that matters - the script/misc > binfmts are just indirection entries - and it supports AT_EXECFD, so > let's just ignore the theoretical case of "maybe nobody exposes it". Would this potentially make the re-exec-yourself case easier to do at some point? (Which nommu needs to do, and /proc/self/exe isn't always available.) Here's the first time I asked about that: https://lore.kernel.org/lkml/200612261823.07927.rob@landley.net/ Here's the most recent: https://lkml.org/lkml/2017/9/5/246 Here's someone else asking and being basically told "chroot isn't a thing": http://lkml.iu.edu/hypermail/linux/kernel/0906.3/00584.html (See also "CVE-2019-5736" and the workarounds thereto.) Rob P.S. Yes I'm aware it would only work properly with static binaries. Not the first thing that's true for.
On Tue, May 12, 2020 at 7:32 PM Rob Landley <rob@landley.net> wrote: > > On 5/12/20 7:20 PM, Linus Torvalds wrote: > > Ack. I think the AT_EXECFD thing is a sign that this isn't internal to > > binfmt_misc, but it also shouldn't be gating this issue. In reality, > > ELF is the only real binary format that matters - the script/misc > > binfmts are just indirection entries - and it supports AT_EXECFD, so > > let's just ignore the theoretical case of "maybe nobody exposes it". > > Would this potentially make the re-exec-yourself case easier to do at some > point? (Which nommu needs to do, and /proc/self/exe isn't always available.) AT_EXECFD may be an ELF thing, but normal ELF binaries don't do that "we have a fd". So it only triggers for binfmt_misc (and only when the flag is set for "I want the fd"). So no, this wouldn't help re-exec-yourself in general. Although I guess we could add an ELF section note that does that whole "executable fd" thing for other things too. Everything is possible in theory.. Linus
Rob Landley <rob@landley.net> writes: > On 5/11/20 9:33 AM, Eric W. Biederman wrote: >> What I do see is that interp_data is just a parameter that is smuggled >> into the call of search binary handler. And the next binary handler >> needs to be binfmt_elf for it to make much sense, as only binfmt_elf >> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD. > > The binfmt_elf_fdpic driver is separate from binfmt_elf for the same reason > ext2/ext3/ext4 used to have 3 drivers: fdpic is really just binfmt_elf with the > 4 main sections (text, data, bss, rodata) able to move independently of each > other (each tracked with its own base pointer). > > It's kind of -fPIE on steroids, and various security people have sniffed at it > over the years to give ASLR more degrees of freedom on with-MMU systems. Many > moons ago Rich Felker proposed teaching the fdpic loader how to load normal ELF > binaries so there's just the one loader (there's a flag in the ELF header to say > whether the sections are independent or not). Careful with your terminology. ELF sections are for .o's For executables ELF have segments. And reading through the code it is the program segments that are independently relocatable. There is a flag but it is defined per architecture and I don't think one of the architectures define it. I looked at ARM and apparently with an MMU ARM turns fdpic binaries into PIE executables. I am not certain why. The registers passed to the entry point are also different for both cases. I think it would have been nice if the fdpic support had used a different ELF type, instead of a different depending on using a different architecture. All that aside the core dumping code looks to be essentially the same between binfmt_elf.c and binfmt_elf_fdpic.c. Do you think people would be interested in refactoring binfmt_elf.c and binfmt_elf_fdpic.c so that they could share the same core dumping code? Eric
Kees Cook <keescook@chromium.org> writes: > On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote: >> And now I wonder if qemu actually uses the resulting AT_EXECFD ... > > It does, though I'm not sure if this is to support crossing mount points, > dropping privileges, or something else, since it does fall back to just > trying to open the file. > > execfd = qemu_getauxval(AT_EXECFD); > if (execfd == 0) { > execfd = open(filename, O_RDONLY); > if (execfd < 0) { > printf("Error while loading %s: %s\n", filename, strerror(errno)); > _exit(EXIT_FAILURE); > } > } My hunch is that the fallback exists from a time when the kernel did not implement AT_EXECFD, or so that qemu can run on kernels that don't implement AT_EXECFD. It doesn't really matter unless the executable is suid, or otherwise changes privileges. I looked into this a bit to remind myself why exec works the way it works, with changing privileges. The classic attack is pointing a symlink at a #! script that is suid or otherwise changes privileges. The kernel will open the script and set the privileges, read the interpreter from the first line, and proceed to exec the interpreter. The interpreter will then open the script using the pathname supplied by the kernel. The name of the symlink. Before the interpreter reopens the script the attack would replace the symlink with a script that does something else, but gets to run with the privileges of the script. Defending against that time of check vs time of use attack is why bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from the interpreter instead of the credentials derived from the script. The other defense is to replace the pathname of the executable that the intepreter will open with /dev/fd/N. All of this predates Linux entirely. I do remember this was fixed at some point in Linux but I don't remember the details. I can just read the solution that was picked in the code. All of this makes me wonder how are the LSMs protected against this attack. Let's see the following LSMS implement brpm_set_creds: tomoyo - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ] smack - Requires CAP_MAC_ADMIN to smack setxattrs [ vulnerable? ] Uses those xattrs in smack_bprm_set_creds apparmor - Everything is based on names so the symlink [ safe? ] attack won't work as it has the wrong name. As long as the trusted names can't be renamed apparmor appears good. selinux - Appears to let anyone set selinux xattrs [ safe? ] Requires permission for a sid transfer As the attack appears not to allow anything that would not be allowed anyway it looks like selinux is safe. LSM folks, especially Casey am I reading this correctly? Did I correctly infer how your LSMs deal with the time of check to time of use attack on the script name? Eric
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> I am still thinking about this one, but here is where I am at. At a >> practical level passing the file descriptor of the script to interpreter >> seems like something we should encourage in the long term. It removes >> races and it is cheaper because then the interpreter does not have to >> turn around and open the script itself. > > Yeah, I think we should continue to support it, because I think it's > the right thing to do (and we might just end up having compatibility > issues if we don't). > > How about trying to move the logic to the common code, out of binfmt_misc? > > IOW, how about something very similar to your "brpm->preserve_creds" > thing that you did for the credentials (also for binfmt_misc, which > shouldn't surprise anybody: binfmt_misc is simply the "this is the > generic thing for letting user mode do the final details"). > >> Calling fd_install in binfmt_misc still seems wrong, as that exposes >> the new file descriptor to user space with the old creds. > > Right. And it really would be good to simply not have these kinds of > very special cases inside the low-level binfmt code: I'd much rather > have the special cases in the generic code, so that we see what the > ordering is etc. One of the big problems with all these binfmt > callbacks has been the fact that it makes it so hard to think about > and change the generic code, because the low-level binfmt handlers all > do their own special thing. > > So moving it to generic code would likely simplify things from that > angle, even if the actual complexity of the feature itself remains. > > Besides, we really have exposed this to other code anyway thanks to > that whole bprm->interp_data thing, and the AT_EXECFD AUX entries that > we have. So it's not really "internal" to binfmt_misc _anyway_. > > So how about we just move the fd_binary logic to the generic execve > code, and just binfmt_misc set the flag for "yes, please do this", > exactly like "preserve_creds"? > >> It is possible although unlikely for userspace to find the file >> descriptor without consulting AT_EXECFD so just to be conservative I >> think we should install the file descriptor in begin_new_exec even if >> the next interpreter does not support AT_EXECFD. > > Ack. I think the AT_EXECFD thing is a sign that this isn't internal to > binfmt_misc, but it also shouldn't be gating this issue. In reality, > ELF is the only real binary format that matters - the script/misc > binfmts are just indirection entries - and it supports AT_EXECFD, so > let's just ignore the theoretical case of "maybe nobody exposes it". > > So yes, just make it part of begin_new_exec(), and there's no reason > to support more than a single fd. No stacks or arrays of these things > required, I feel. It's not like AT_EXECFD supports the notion of > multiple fd's being reported anyway, nor does it make any sense to > have some kind of nested misc->misc binfmt nesting. > > So making that whole interp_data and fd_binary thing be a generic > layer thing would make the search_binary_handler() code in binfmt_misc > be a pure tailcall too, and then the conversion to a loop ends up > working and being the right thing. That is pretty much what I have been thinking. I have just been taking it slow so I find as many funny corner cases as I can. Nothing ever clears the BINPRM_FLAGS_EXECFD so the current code can not support nesting. Now I do think a nested misc->misc binfmt thing can make sense in principal. I have an old dos spectrum emulator that I use to play some of the games that I grew up with. Running that emulator makes me two emulators deep. I can also imagine writting a domain specific language in python or perl, and setting things up so scripts in the domain specific language can be run directly. So I think I need to deliberately test and prevent a nested misc->misc, just so data structures don't get stomped. If the cases where it could useful prove sufficiently interesting we can enable them later. Eric
On 5/14/2020 7:56 AM, Eric W. Biederman wrote: > Kees Cook <keescook@chromium.org> writes: > >> On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote: >>> And now I wonder if qemu actually uses the resulting AT_EXECFD ... >> It does, though I'm not sure if this is to support crossing mount points, >> dropping privileges, or something else, since it does fall back to just >> trying to open the file. >> >> execfd = qemu_getauxval(AT_EXECFD); >> if (execfd == 0) { >> execfd = open(filename, O_RDONLY); >> if (execfd < 0) { >> printf("Error while loading %s: %s\n", filename, strerror(errno)); >> _exit(EXIT_FAILURE); >> } >> } > My hunch is that the fallback exists from a time when the kernel did not > implement AT_EXECFD, or so that qemu can run on kernels that don't > implement AT_EXECFD. It doesn't really matter unless the executable is > suid, or otherwise changes privileges. > > > I looked into this a bit to remind myself why exec works the way it > works, with changing privileges. > > The classic attack is pointing a symlink at a #! script that is suid or > otherwise changes privileges. The kernel will open the script and set > the privileges, read the interpreter from the first line, and proceed to > exec the interpreter. The interpreter will then open the script using > the pathname supplied by the kernel. The name of the symlink. > Before the interpreter reopens the script the attack would replace > the symlink with a script that does something else, but gets to run > with the privileges of the script. > > > Defending against that time of check vs time of use attack is why > bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from > the interpreter instead of the credentials derived from the script. > > > The other defense is to replace the pathname of the executable that the > intepreter will open with /dev/fd/N. > > All of this predates Linux entirely. I do remember this was fixed at > some point in Linux but I don't remember the details. I can just read > the solution that was picked in the code. > > > > All of this makes me wonder how are the LSMs protected against this > attack. > > Let's see the following LSMS implement brpm_set_creds: > tomoyo - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ] > smack - Requires CAP_MAC_ADMIN to smack setxattrs [ vulnerable? ] > Uses those xattrs in smack_bprm_set_creds What is the concern? If the xattrs change after the check, the behavior should still be consistent. > apparmor - Everything is based on names so the symlink [ safe? ] > attack won't work as it has the wrong name. > As long as the trusted names can't be renamed > apparmor appears good. > selinux - Appears to let anyone set selinux xattrs [ safe? ] > Requires permission for a sid transfer > As the attack appears not to allow anything that > would not be allowed anyway it looks like selinux > is safe. > > LSM folks, especially Casey am I reading this correctly? Did I > correctly infer how your LSMs deal with the time of check to time of use > attack on the script name? > > Eric >
Casey Schaufler <casey@schaufler-ca.com> writes: > On 5/14/2020 7:56 AM, Eric W. Biederman wrote: >> Kees Cook <keescook@chromium.org> writes: >> >>> On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote: >>>> And now I wonder if qemu actually uses the resulting AT_EXECFD ... >>> It does, though I'm not sure if this is to support crossing mount points, >>> dropping privileges, or something else, since it does fall back to just >>> trying to open the file. >>> >>> execfd = qemu_getauxval(AT_EXECFD); >>> if (execfd == 0) { >>> execfd = open(filename, O_RDONLY); >>> if (execfd < 0) { >>> printf("Error while loading %s: %s\n", filename, strerror(errno)); >>> _exit(EXIT_FAILURE); >>> } >>> } >> My hunch is that the fallback exists from a time when the kernel did not >> implement AT_EXECFD, or so that qemu can run on kernels that don't >> implement AT_EXECFD. It doesn't really matter unless the executable is >> suid, or otherwise changes privileges. >> >> >> I looked into this a bit to remind myself why exec works the way it >> works, with changing privileges. >> >> The classic attack is pointing a symlink at a #! script that is suid or >> otherwise changes privileges. The kernel will open the script and set >> the privileges, read the interpreter from the first line, and proceed to >> exec the interpreter. The interpreter will then open the script using >> the pathname supplied by the kernel. The name of the symlink. >> Before the interpreter reopens the script the attack would replace >> the symlink with a script that does something else, but gets to run >> with the privileges of the script. >> >> >> Defending against that time of check vs time of use attack is why >> bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from >> the interpreter instead of the credentials derived from the script. >> >> >> The other defense is to replace the pathname of the executable that the >> intepreter will open with /dev/fd/N. >> >> All of this predates Linux entirely. I do remember this was fixed at >> some point in Linux but I don't remember the details. I can just read >> the solution that was picked in the code. >> >> >> >> All of this makes me wonder how are the LSMs protected against this >> attack. >> >> Let's see the following LSMS implement brpm_set_creds: >> tomoyo - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ] >> smack - Requires CAP_MAC_ADMIN to smack setxattrs [ vulnerable? ] >> Uses those xattrs in smack_bprm_set_creds > > What is the concern? If the xattrs change after the check, > the behavior should still be consistent. The concern is that there are xattrs set on a #! script. Someone replaces the script after smack reads the xattr and sets bprm->cred but before the interpreter reopens the script. In short if there is one script with xattrs set. I can run any script as if those xattrs were set on it. I don't know the smack security model well enough to know if that is a problem or not. It looks like it may be a concern because smack limits who can mess with it's security xattrs. Eric
On 5/13/20 4:59 PM, Eric W. Biederman wrote: > Careful with your terminology. ELF sections are for .o's For > executables ELF have segments. And reading through the code it is the > program segments that are independently relocatable. Sorry, I have trouble keeping this stuff straight when it's not in front of me. (I have a paperback copy of the old "linkers and loaders" book and it was the driest thing I have _ever_ slogged through. Back before the Linux Foundation ate the FSG I was pushing https://refspecs.linuxbase.org/ to include missing ABI supplement, I have copies of ones it doesn't collected from now long-dead sites...) But more recently I've just made puppy eyes at Rich Felker to have him fix this stuff for me, because I do _not_ retain the terminology here. REL vs RELA vs PLT, can you have a PLT without a GOT...? > There is a flag but it is defined per architecture and I don't think one > of the architectures define it. They all check for one, but I don't remember there being a #define. I have a todo item to check more architectures' fdpic binaries, this was from sh2eb (ala j-core): https://github.com/landley/toybox/commit/d61aeaf9e#diff-4442ddbb8949R65 There was the out of tree arm fdpic toolchain from the french guys for cortex-m, and the original frv paper, and in theory blackfin but nothing they touched ever got merged upstream anywhere: In _theory_ you could do fdpic for x86, but as with u-boot for x86 nobody ever bothers because it's got an x86-only solution. (And then the x86 version of stuff gets pushed to other platforms because all our device tree files were GPLed so of course acpi for arm became a thing. Sigh...) > I looked at ARM and apparently with an MMU ARM turns fdpic binaries into > PIE executables. I am not certain why. Falling back to a more widely tested codepath, I expect. Also maybe it saves 3 registers if all 4 are using the same base register? Map them linearly and it becomes "single base + offset"? Which of course looses the extra ASLR benefits the security people wanted, but "undoing what the security people want in the name of an unmeasurable microbenchmark optimization" is a proud tradition. Just because the 4 segments are compiled as independently relocatable doesn't mean they HAVE to be. (You'd think the code would be using different register numbers to index stuff so you'd STILL be using 4 registers, but I haven't looked at what arm's doing...) > The registers passed to the entry point are also different for both > cases. From the same machine code chunks? I boggle at what the ld.so fixup is doing then... > I think it would have been nice if the fdpic support had used a > different ELF type, instead of a different depending on using a > different architecture. This is what you get when a blackfin developer talks to the gnu/binutils developers: https://sourceware.org/legacy-ml/binutils/2008-04/msg00350.html > All that aside the core dumping code looks to be essentially the same > between binfmt_elf.c and binfmt_elf_fdpic.c. Do you think people would > be interested in refactoring binfmt_elf.c and binfmt_elf_fdpic.c so that > they could share the same core dumping code? I think merging the two of them together entirely would be a good idea, and anything that can collapse together I'm happy to regression test on sh2. I also note that qemu-sh4eb can run these binaries, maybe I can whip up a qemu-system-sh4eb that runs a nommu fdpic userspace... [hours later] Ok, here's me asking Rich Felker a question: >>> So fdpic binaries run under qemu-sh2eb and there's a qemu-system-sh2eb that >>> SHOULD also be able to run them under the r2d board emulation, and the kernel >>> builds fine under the sh2eb compiler but I can't enable fdpic support without >>> CONFIG_NOMMU, and if I yank that dependency from Kconfig (which only sh2 has, >>> arm and such do fdpic with or without mmu) the build breaks with: >>> >>> /home/landley/toybox/clean/ccc/sh2eb-linux-muslfdpic-cross/bin/sh2eb-linux-muslfdpic-ld: >>> fs/binfmt_elf_fdpic.o: in function `load_elf_fdpic_binary': >>> binfmt_elf_fdpic.c:(.text+0x1734): undefined reference to >>> `elf_fdpic_arch_lay_out_mm' >>> >>> The problem is if I switch off CONFIG_MMU in the kernel, buckets of stuff in the >>> r2d board kernel config changes and suddenly I don't get serial output from the >>> qemu-system-sh2eb -M r2d boot anymore. Before it was running the kernel but just >>> failing to run init... And his response: >> I don't think qemu-system-sh4eb can boot a nommu kernel. But you don't >> need to in order to do userspace-only testing. Just build a normal >> sh4eb kernel. It doesn't need CONFIG_BINFMT_ELF_FDPIC. The normal ELF >> loader can load FDPIC just fine, because a valid FDPIC ELF file is a >> valid ELF file, just with more constraints (in same sense a square is >> a rectangle). The normal ELF loader won't independently float the text >> and data segments, but that's okay because your emulated system has an >> MMU and can just map them adjacently like they show up in the ELF file >> with their untransformed addresses. >> >> Now that I think about it, it's possible that the ARM folks broke this >> when adding support for enabling CONFIG_BINFMT_ELF_FDPIC with MMU. If >> so, and you find you really do need the FDPIC loader now because they >> made the normal ELF loader refuse to do it, I think it will suffice to >> copy the ARM version of elf_fdpic_arch_lay_out_mm from >> arch/arm/kernel/elf.c to somewhere it will be compiled on SH. I.E. testing the kernel fdpic loader under qemu is NOT EASY (because the fdpic loader refuses to build in a with-mmu context, and the relevant board emulations refuse to build without), but it can fall back to the conventional ELF loader which collates the segments and treats fdpic as PIE? (Which... is how qemu-sh2eb application emulation is loading them...?) Which was news to me... > Eric Rob
diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c index a8d0d6e06526..a90c8b1d5498 100644 --- a/arch/alpha/kernel/binfmt_loader.c +++ b/arch/alpha/kernel/binfmt_loader.c @@ -38,7 +38,7 @@ static int load_binary(struct linux_binprm *bprm) retval = prepare_binprm(bprm); if (retval < 0) return retval; - return search_binary_handler(bprm); + return 1; /* Search for the interpreter */ } static struct linux_binfmt loader_format = { diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c index 466497860c62..a9b9ac7f9bb0 100644 --- a/fs/binfmt_em86.c +++ b/fs/binfmt_em86.c @@ -95,7 +95,7 @@ static int load_em86(struct linux_binprm *bprm) if (retval < 0) return retval; - return search_binary_handler(bprm); + return 1; /* Search for the interpreter */ } static struct linux_binfmt em86_format = { diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index cdb45829354d..127fae9c21ab 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -234,10 +234,7 @@ static int load_misc_binary(struct linux_binprm *bprm) if (retval < 0) goto error; - retval = search_binary_handler(bprm); - if (retval < 0) - goto error; - + retval = 1; /* Search for the interpreter */ ret: dput(fmt->dentry); return retval; diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index e9e6a6f4a35f..76a05696d376 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -146,7 +146,7 @@ static int load_script(struct linux_binprm *bprm) retval = prepare_binprm(bprm); if (retval < 0) return retval; - return search_binary_handler(bprm); + return 1; /* Search for the interpreter */ } static struct linux_binfmt script_format = { diff --git a/fs/exec.c b/fs/exec.c index 635b5085050c..8bbf5fa785a6 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1690,16 +1690,12 @@ EXPORT_SYMBOL(remove_arg_zero); /* * cycle the list of binary formats handler, until one recognizes the image */ -int search_binary_handler(struct linux_binprm *bprm) +static int search_binary_handler(struct linux_binprm *bprm) { bool need_retry = IS_ENABLED(CONFIG_MODULES); struct linux_binfmt *fmt; int retval; - /* This allows 4 levels of binfmt rewrites before failing hard. */ - if (bprm->recursion_depth > 5) - return -ELOOP; - retval = security_bprm_check(bprm); if (retval) return retval; @@ -1712,10 +1708,7 @@ int search_binary_handler(struct linux_binprm *bprm) continue; read_unlock(&binfmt_lock); - bprm->recursion_depth++; retval = fmt->load_binary(bprm); - bprm->recursion_depth--; - read_lock(&binfmt_lock); put_binfmt(fmt); if (bprm->point_of_no_return || !bprm->file || @@ -1738,12 +1731,11 @@ int search_binary_handler(struct linux_binprm *bprm) return retval; } -EXPORT_SYMBOL(search_binary_handler); static int exec_binprm(struct linux_binprm *bprm) { pid_t old_pid, old_vpid; - int ret; + int ret, depth = 0; /* Need to fetch pid before load_binary changes it */ old_pid = current->pid; @@ -1751,7 +1743,13 @@ static int exec_binprm(struct linux_binprm *bprm) old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent)); rcu_read_unlock(); - ret = search_binary_handler(bprm); + do { + depth++; + ret = search_binary_handler(bprm); + /* This allows 4 levels of binfmt rewrites before failing hard. */ + if ((ret > 0) && (depth > 5)) + ret = -ELOOP; + } while (ret > 0); if (ret >= 0) { audit_bprm(bprm); trace_sched_process_exec(current, old_pid, bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 42f760acfc2c..89f1135dcb75 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -47,7 +47,6 @@ struct linux_binprm { #ifdef __alpha__ unsigned int taso:1; #endif - unsigned int recursion_depth; /* only for search_binary_handler() */ struct file * file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ @@ -118,7 +117,6 @@ extern void unregister_binfmt(struct linux_binfmt *); extern int prepare_binprm(struct linux_binprm *); extern int __must_check remove_arg_zero(struct linux_binprm *); -extern int search_binary_handler(struct linux_binprm *); extern int begin_new_exec(struct linux_binprm * bprm); extern void setup_new_exec(struct linux_binprm * bprm); extern void finalize_exec(struct linux_binprm *bprm);
Instead of recursing in search_binary_handler have the methods that would recurse return a positive value, and simply loop in exec_binprm. This is a trivial change as all of the methods that would recurse do so as effectively the last thing they do. Making this a trivial code change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- arch/alpha/kernel/binfmt_loader.c | 2 +- fs/binfmt_em86.c | 2 +- fs/binfmt_misc.c | 5 +---- fs/binfmt_script.c | 2 +- fs/exec.c | 20 +++++++++----------- include/linux/binfmts.h | 2 -- 6 files changed, 13 insertions(+), 20 deletions(-)