Message ID | 20230907204256.3700336-1-gpiccoli@igalia.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce a way to expose the interpreted file with binfmt_misc | expand |
On 07/09/2023 22:24, Guilherme G. Piccoli wrote: > Currently the kernel provides a symlink to the executable binary, in the > form of procfs file exe_file (/proc/self/exe_file for example). But what > happens in interpreted scenarios (like binfmt_misc) is that such link > always points to the *interpreter*. For cases of Linux binary emulators, > like FEX [0] for example, it's then necessary to somehow mask that and > emulate the true binary path. > > We hereby propose a way to expose such interpreted binary as exe_file if > the flag 'I' is selected on binfmt_misc. When that flag is set, the file > /proc/self/exe_file points to the *interpreted* file, be it ELF or not. > In order to allow users to distinguish if such flag is used or not without > checking the binfmt_misc filesystem, we propose also the /proc/self/interpreter > file, which always points to the *interpreter* in scenarios where > interpretation is set, like binfmt_misc. This file is empty / points to > nothing in the case of regular ELF execution, though we could consider > implementing a way to point to the LD preloader if that makes sense... > > This was sent as RFC because of course it's a very core change, affecting > multiple areas and there are design choices (and questions) in each patch > so we could discuss and check the best way to implement the solution as > well as the corner cases handling. This is a very useful feature for > emulators and such, like FEX and Wine, which usually need to circumvent > this kernel limitation in order to expose the true emulated file name > (more examples at [1][2][3]). > > This patchset is based on the currently v6.6-rc1 candidate (Linus tree > from yesterday) and was tested under QEMU as well as using FEX. > Thanks in advance for comments, any feedback is greatly appreciated! > Cheers, > > Guilherme > > > [0] https://github.com/FEX-Emu/FEX > > [1] Using an environment variable trick to override exe_file: > https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/u_process.c#L209 > > [2] https://github.com/baldurk/renderdoc/pull/2694 > > [3] FEX handling of the exe_file parsing: > https://github.com/FEX-Emu/FEX/blob/main/Source/Tools/FEXLoader/LinuxSyscalls/FileManagement.cpp#L499 > > Hi folks, gentle monthly ping. Any opinions / suggestions on that? Thanks in advance, Guilherme
On 07.09.23 22:24, Guilherme G. Piccoli wrote: > Currently the kernel provides a symlink to the executable binary, in the > form of procfs file exe_file (/proc/self/exe_file for example). But what > happens in interpreted scenarios (like binfmt_misc) is that such link > always points to the *interpreter*. For cases of Linux binary emulators, > like FEX [0] for example, it's then necessary to somehow mask that and > emulate the true binary path. I'm absolutely no expert on that, but I'm wondering if, instead of modifying exe_file and adding an interpreter file, you'd want to leave exe_file alone and instead provide an easier way to obtain the interpreted file. Can you maybe describe why modifying exe_file is desired (about which consumers are we worrying? ) and what exactly FEX does to handle that (how does it mask that?). So a bit more background on the challenges without this change would be appreciated.
On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote: > On 07.09.23 22:24, Guilherme G. Piccoli wrote: > > Currently the kernel provides a symlink to the executable binary, in the > > form of procfs file exe_file (/proc/self/exe_file for example). But what > > happens in interpreted scenarios (like binfmt_misc) is that such link > > always points to the *interpreter*. For cases of Linux binary emulators, > > like FEX [0] for example, it's then necessary to somehow mask that and > > emulate the true binary path. > > I'm absolutely no expert on that, but I'm wondering if, instead of modifying > exe_file and adding an interpreter file, you'd want to leave exe_file alone > and instead provide an easier way to obtain the interpreted file. > > Can you maybe describe why modifying exe_file is desired (about which > consumers are we worrying? ) and what exactly FEX does to handle that (how > does it mask that?). > > So a bit more background on the challenges without this change would be > appreciated. Yeah, it sounds like you're dealing with a process that examines /proc/self/exe_file for itself only to find the binfmt_misc interpreter when it was run via binfmt_misc? What actually breaks? Or rather, why does the process to examine exe_file? I'm just trying to see if there are other solutions here that would avoid creating an ambiguous interface...
On Mon, Oct 9, 2023 at 10:37 AM Kees Cook <keescook@chromium.org> wrote: > > On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote: > > On 07.09.23 22:24, Guilherme G. Piccoli wrote: > > > Currently the kernel provides a symlink to the executable binary, in the > > > form of procfs file exe_file (/proc/self/exe_file for example). But what > > > happens in interpreted scenarios (like binfmt_misc) is that such link > > > always points to the *interpreter*. For cases of Linux binary emulators, > > > like FEX [0] for example, it's then necessary to somehow mask that and > > > emulate the true binary path. > > > > I'm absolutely no expert on that, but I'm wondering if, instead of modifying > > exe_file and adding an interpreter file, you'd want to leave exe_file alone > > and instead provide an easier way to obtain the interpreted file. > > > > Can you maybe describe why modifying exe_file is desired (about which > > consumers are we worrying? ) and what exactly FEX does to handle that (how > > does it mask that?). > > > > So a bit more background on the challenges without this change would be > > appreciated. > > Yeah, it sounds like you're dealing with a process that examines > /proc/self/exe_file for itself only to find the binfmt_misc interpreter > when it was run via binfmt_misc? > > What actually breaks? Or rather, why does the process to examine > exe_file? I'm just trying to see if there are other solutions here that > would avoid creating an ambiguous interface... > > -- > Kees Cook Hey there, FEX-Emu developer here. I can try and explain some of the issues. First thing is that we should set the stage here that there is a fundamental discrepancy between how ELF interpreters are represented versus binfmt_misc interpreters when it comes to procfs exe. An ELF file today can either be static or dynamic, with the dynamic ELF files having a program header called PT_INTERP which will tell the kernel where its interpreter executable lives. In an x86-64 environment this is likely to be something like /lib64/ld-linux-x86-64.so.2. Today, the Kernel doesn't put the PT_INTERP handle into procfs exe, it instead uses the dynamic ELF that was originally launched. In contrast to how this behaviour works, a binfmt_misc interpreter file getting launched through execve may or may not have ELF header sections. But it is left up to the binfmt_misc handler to do whatever it may need. The kernel sets procfs exe to the binfmt_misc interpreter instead of the executable. This is fundamentally the contrasting behaviour that is trying to be improved. It seems like the this behaviour is an oversight of the original binfmt_misc implementation rather than any sort of ambition to ensure there is a difference. It's already ambiguous that the interface changes when executing an executable through binfmt_misc. Some simple ways applications break: - Applications like chrome tend to relaunch themselves through execve with `/proc/self/exe` - Chrome does this. I think Flatpaks or AppImage applications do this? - There are definitely more that do this that I have noticed. - In the cover letter there was a link to Mesa, the OSS OpenGL/Vulkan drivers using this - This library uses this interface to find out what application is running for applying workarounds for application bugs. Plenty of historical applications that use the API badly or incorrectly and need specific driver workarounds for them. - Some applications may use this path to open their own executable path and then mmap back in for doing tricky memory mirroring or dynamic linking of themselves. - Saw some old abandoned emulator software doing this. There's likely more uses that I haven't noticed from software using this interface. Onward to what FEX-Emu is and how it tries working around the issue with a fairly naive hack. FEX-Emu is an x86 and x86-64 CPU emulator that gets installed as a binfmt_misc interpreter. It then executes x86 and x86-64 ELF files on an Arm64 device as effectively a multi-arch capable fashion. It's lightweight in that all application processes and threads are just regular Arm64 processes and threads. This is similar to how qemu-user operates. When processing system calls, FEX will intercept any call that consumes a pathname, it will then inspect that path name and if it is one of the ways it is possible to access procfs/exe then it redirects to the true x86/x86-64 executable. This is an attempt to behave like how if the ELF was executed without a binfmt_misc handler. Pathnames captured in FEX-Emu today: - /proc/self/exe - /proc/<pid>/exe - /proc/thread-self/exe This is very fragile and doesn't cover the full range of how applications could access procfs. Applications could end up using the *at variants of syscalls with an FD that has /proc/self/ open. They could do simple tricks like `/proc/self/../self/exe` and it would side-step this check. It's a game of whack-a-mole and escalating overhead to try and close the gap purely due to, what appears to be, an oversight in how binfmt_misc and PT_INTERP is handled. Hopefully this explains why this is necessary and that reducing the differences between how PT_INTERP and binfmt_misc are represented is desired.
On 09/10/2023 14:37, Kees Cook wrote: > On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote: >> On 07.09.23 22:24, Guilherme G. Piccoli wrote: >>> Currently the kernel provides a symlink to the executable binary, in the >>> form of procfs file exe_file (/proc/self/exe_file for example). But what >>> happens in interpreted scenarios (like binfmt_misc) is that such link >>> always points to the *interpreter*. For cases of Linux binary emulators, >>> like FEX [0] for example, it's then necessary to somehow mask that and >>> emulate the true binary path. >> >> I'm absolutely no expert on that, but I'm wondering if, instead of modifying >> exe_file and adding an interpreter file, you'd want to leave exe_file alone >> and instead provide an easier way to obtain the interpreted file. >> >> Can you maybe describe why modifying exe_file is desired (about which >> consumers are we worrying? ) and what exactly FEX does to handle that (how >> does it mask that?). >> >> So a bit more background on the challenges without this change would be >> appreciated. > > Yeah, it sounds like you're dealing with a process that examines > /proc/self/exe_file for itself only to find the binfmt_misc interpreter > when it was run via binfmt_misc? > > What actually breaks? Or rather, why does the process to examine > exe_file? I'm just trying to see if there are other solutions here that > would avoid creating an ambiguous interface... > Thanks Kees and David! Did Ryan's thorough comment addressed your questions? Do you have any take on the TODOs? I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense! But would be better having the TODOs addressed, I guess. Thanks in advance for reviews and feedback on this. Cheers, Guilherme
"Guilherme G. Piccoli" <gpiccoli@igalia.com> writes: > On 09/10/2023 14:37, Kees Cook wrote: >> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote: >>> On 07.09.23 22:24, Guilherme G. Piccoli wrote: >>>> Currently the kernel provides a symlink to the executable binary, in the >>>> form of procfs file exe_file (/proc/self/exe_file for example). But what >>>> happens in interpreted scenarios (like binfmt_misc) is that such link >>>> always points to the *interpreter*. For cases of Linux binary emulators, >>>> like FEX [0] for example, it's then necessary to somehow mask that and >>>> emulate the true binary path. >>> >>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying >>> exe_file and adding an interpreter file, you'd want to leave exe_file alone >>> and instead provide an easier way to obtain the interpreted file. >>> >>> Can you maybe describe why modifying exe_file is desired (about which >>> consumers are we worrying? ) and what exactly FEX does to handle that (how >>> does it mask that?). >>> >>> So a bit more background on the challenges without this change would be >>> appreciated. >> >> Yeah, it sounds like you're dealing with a process that examines >> /proc/self/exe_file for itself only to find the binfmt_misc interpreter >> when it was run via binfmt_misc? >> >> What actually breaks? Or rather, why does the process to examine >> exe_file? I'm just trying to see if there are other solutions here that >> would avoid creating an ambiguous interface... >> > > Thanks Kees and David! Did Ryan's thorough comment addressed your > questions? Do you have any take on the TODOs? > > I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense! > But would be better having the TODOs addressed, I guess. Currently there is a mechanism in the kernel for changing /proc/self/exe. Would that be reasonable to use in this case? It came from the checkpoint/restart work, but given that it is already implemented it seems like the path of least resistance to get your binfmt_misc that wants to look like binfmt_elf to use that mechanism. Eric
On 13.11.23 19:29, Eric W. Biederman wrote: > "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes: > >> On 09/10/2023 14:37, Kees Cook wrote: >>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote: >>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote: >>>>> Currently the kernel provides a symlink to the executable binary, in the >>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what >>>>> happens in interpreted scenarios (like binfmt_misc) is that such link >>>>> always points to the *interpreter*. For cases of Linux binary emulators, >>>>> like FEX [0] for example, it's then necessary to somehow mask that and >>>>> emulate the true binary path. >>>> >>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying >>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone >>>> and instead provide an easier way to obtain the interpreted file. >>>> >>>> Can you maybe describe why modifying exe_file is desired (about which >>>> consumers are we worrying? ) and what exactly FEX does to handle that (how >>>> does it mask that?). >>>> >>>> So a bit more background on the challenges without this change would be >>>> appreciated. >>> >>> Yeah, it sounds like you're dealing with a process that examines >>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter >>> when it was run via binfmt_misc? >>> >>> What actually breaks? Or rather, why does the process to examine >>> exe_file? I'm just trying to see if there are other solutions here that >>> would avoid creating an ambiguous interface... >>> >> >> Thanks Kees and David! Did Ryan's thorough comment addressed your >> questions? Do you have any take on the TODOs? >> >> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense! >> But would be better having the TODOs addressed, I guess. > > Currently there is a mechanism in the kernel for changing > /proc/self/exe. Would that be reasonable to use in this case? > > It came from the checkpoint/restart work, but given that it is already > implemented it seems like the path of least resistance to get your > binfmt_misc that wants to look like binfmt_elf to use that mechanism. I had that in mind as well, but prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable is still mmaped (due to denywrite handling); that should be the case for the emulator I strongly assume.
On 13/11/2023 15:29, Eric W. Biederman wrote: > [...] > Currently there is a mechanism in the kernel for changing > /proc/self/exe. Would that be reasonable to use in this case? > > It came from the checkpoint/restart work, but given that it is already > implemented it seems like the path of least resistance to get your > binfmt_misc that wants to look like binfmt_elf to use that mechanism. > > Eric > Thanks Eric! I'm curious on how that would work: we'd change the symlink of the emulator? So, the *emulated* software, when reading that, would see the correct symlink? Also, just to fully clarify: are you suggesting we hook the new binfmt_misc flag proposed here to the internal kernel way of changing the proc/self/exe symlink, or are you suggesting we use the prctl() tune from the emulator, like the userspace changing its own symlink? One of the biggest concerns I have with this kind of approach is that changing the symlink actually...changes it - the binary mapping itself, I mean. Whereas my way was a "fake" change, just expose one thing for the emulated app, but changes nothing else... Cheers, Guilherme
David Hildenbrand <david@redhat.com> writes: > On 13.11.23 19:29, Eric W. Biederman wrote: >> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes: >> >>> On 09/10/2023 14:37, Kees Cook wrote: >>>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote: >>>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote: >>>>>> Currently the kernel provides a symlink to the executable binary, in the >>>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what >>>>>> happens in interpreted scenarios (like binfmt_misc) is that such link >>>>>> always points to the *interpreter*. For cases of Linux binary emulators, >>>>>> like FEX [0] for example, it's then necessary to somehow mask that and >>>>>> emulate the true binary path. >>>>> >>>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying >>>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone >>>>> and instead provide an easier way to obtain the interpreted file. >>>>> >>>>> Can you maybe describe why modifying exe_file is desired (about which >>>>> consumers are we worrying? ) and what exactly FEX does to handle that (how >>>>> does it mask that?). >>>>> >>>>> So a bit more background on the challenges without this change would be >>>>> appreciated. >>>> >>>> Yeah, it sounds like you're dealing with a process that examines >>>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter >>>> when it was run via binfmt_misc? >>>> >>>> What actually breaks? Or rather, why does the process to examine >>>> exe_file? I'm just trying to see if there are other solutions here that >>>> would avoid creating an ambiguous interface... >>>> >>> >>> Thanks Kees and David! Did Ryan's thorough comment addressed your >>> questions? Do you have any take on the TODOs? >>> >>> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense! >>> But would be better having the TODOs addressed, I guess. >> Currently there is a mechanism in the kernel for changing >> /proc/self/exe. Would that be reasonable to use in this case? >> It came from the checkpoint/restart work, but given that it is >> already >> implemented it seems like the path of least resistance to get your >> binfmt_misc that wants to look like binfmt_elf to use that mechanism. > > I had that in mind as well, but > prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable > is still mmaped (due to denywrite handling); that should be the case > for the emulator I strongly assume. Bah yes. The sanity check that that the old executable is no longer mapped does make it so that we can't trivially change the /proc/self/exe using prctl(PR_SET_MM_EXE_FILE). Eric
On 14.11.23 17:11, Eric W. Biederman wrote: > David Hildenbrand <david@redhat.com> writes: > >> On 13.11.23 19:29, Eric W. Biederman wrote: >>> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes: >>> >>>> On 09/10/2023 14:37, Kees Cook wrote: >>>>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote: >>>>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote: >>>>>>> Currently the kernel provides a symlink to the executable binary, in the >>>>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what >>>>>>> happens in interpreted scenarios (like binfmt_misc) is that such link >>>>>>> always points to the *interpreter*. For cases of Linux binary emulators, >>>>>>> like FEX [0] for example, it's then necessary to somehow mask that and >>>>>>> emulate the true binary path. >>>>>> >>>>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying >>>>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone >>>>>> and instead provide an easier way to obtain the interpreted file. >>>>>> >>>>>> Can you maybe describe why modifying exe_file is desired (about which >>>>>> consumers are we worrying? ) and what exactly FEX does to handle that (how >>>>>> does it mask that?). >>>>>> >>>>>> So a bit more background on the challenges without this change would be >>>>>> appreciated. >>>>> >>>>> Yeah, it sounds like you're dealing with a process that examines >>>>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter >>>>> when it was run via binfmt_misc? >>>>> >>>>> What actually breaks? Or rather, why does the process to examine >>>>> exe_file? I'm just trying to see if there are other solutions here that >>>>> would avoid creating an ambiguous interface... >>>>> >>>> >>>> Thanks Kees and David! Did Ryan's thorough comment addressed your >>>> questions? Do you have any take on the TODOs? >>>> >>>> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense! >>>> But would be better having the TODOs addressed, I guess. >>> Currently there is a mechanism in the kernel for changing >>> /proc/self/exe. Would that be reasonable to use in this case? >>> It came from the checkpoint/restart work, but given that it is >>> already >>> implemented it seems like the path of least resistance to get your >>> binfmt_misc that wants to look like binfmt_elf to use that mechanism. >> >> I had that in mind as well, but >> prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable >> is still mmaped (due to denywrite handling); that should be the case >> for the emulator I strongly assume. > > Bah yes. The sanity check that that the old executable is no longer > mapped does make it so that we can't trivially change the /proc/self/exe > using prctl(PR_SET_MM_EXE_FILE). I was wondering if we should have a new file (yet have to come up witha fitting name) that defaults to /proc/self/exe as long as that new file doesn't explicitly get set via a prctl. So /proc/self/exe would indeed always show the emulator (executable), but the new file could be adjusted to something that is being executed by the emulator. Just a thought ... I'd rather leave /proc/self/exe alone.