mbox series

[RFC,0/2] Introduce a way to expose the interpreted file with binfmt_misc

Message ID 20230907204256.3700336-1-gpiccoli@igalia.com (mailing list archive)
Headers show
Series Introduce a way to expose the interpreted file with binfmt_misc | expand

Message

Guilherme G. Piccoli Sept. 7, 2023, 8:24 p.m. UTC
Currently the kernel provides a symlink to the executable binary, in the
form of procfs file exe_file (/proc/self/exe_file for example). But what
happens in interpreted scenarios (like binfmt_misc) is that such link
always points to the *interpreter*. For cases of Linux binary emulators,
like FEX [0] for example, it's then necessary to somehow mask that and
emulate the true binary path.

We hereby propose a way to expose such interpreted binary as exe_file if
the flag 'I' is selected on binfmt_misc. When that flag is set, the file
/proc/self/exe_file points to the *interpreted* file, be it ELF or not.
In order to allow users to distinguish if such flag is used or not without
checking the binfmt_misc filesystem, we propose also the /proc/self/interpreter
file, which always points to the *interpreter* in scenarios where
interpretation is set, like binfmt_misc. This file is empty / points to
nothing in the case of regular ELF execution, though we could consider
implementing a way to point to the LD preloader if that makes sense...

This was sent as RFC because of course it's a very core change, affecting
multiple areas and there are design choices (and questions) in each patch
so we could discuss and check the best way to implement the solution as
well as the corner cases handling. This is a very useful feature for
emulators and such, like FEX and Wine, which usually need to circumvent
this kernel limitation in order to expose the true emulated file name
(more examples at [1][2][3]).

This patchset is based on the currently v6.6-rc1 candidate (Linus tree
from yesterday) and was tested under QEMU as well as using FEX.
Thanks in advance for comments, any feedback is greatly appreciated!
Cheers,

Guilherme


[0] https://github.com/FEX-Emu/FEX

[1] Using an environment variable trick to override exe_file:
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/u_process.c#L209 

[2] https://github.com/baldurk/renderdoc/pull/2694

[3] FEX handling of the exe_file parsing:
https://github.com/FEX-Emu/FEX/blob/main/Source/Tools/FEXLoader/LinuxSyscalls/FileManagement.cpp#L499


Guilherme G. Piccoli (2):
  binfmt_misc, fork, proc: Introduce flag to expose the interpreted binary in procfs
  fork, procfs: Introduce /proc/self/interpreter symlink

 Documentation/admin-guide/binfmt-misc.rst |  11 ++
 arch/arc/kernel/troubleshoot.c            |   5 +
 fs/binfmt_elf.c                           |   7 ++
 fs/binfmt_misc.c                          |  11 ++
 fs/coredump.c                             |   5 +
 fs/exec.c                                 |  26 ++++-
 fs/proc/base.c                            |  48 +++++---
 include/linux/binfmts.h                   |   4 +
 include/linux/mm.h                        |   7 +-
 include/linux/mm_types.h                  |   2 +
 kernel/audit.c                            |   5 +
 kernel/audit_watch.c                      |   7 +-
 kernel/fork.c                             | 131 +++++++++++++++++-----
 kernel/signal.c                           |   7 +-
 kernel/sys.c                              |   5 +
 kernel/taskstats.c                        |   7 +-
 security/tomoyo/util.c                    |   5 +
 17 files changed, 241 insertions(+), 52 deletions(-)

Comments

Guilherme G. Piccoli Oct. 6, 2023, 7:51 a.m. UTC | #1
On 07/09/2023 22:24, Guilherme G. Piccoli wrote:
> Currently the kernel provides a symlink to the executable binary, in the
> form of procfs file exe_file (/proc/self/exe_file for example). But what
> happens in interpreted scenarios (like binfmt_misc) is that such link
> always points to the *interpreter*. For cases of Linux binary emulators,
> like FEX [0] for example, it's then necessary to somehow mask that and
> emulate the true binary path.
> 
> We hereby propose a way to expose such interpreted binary as exe_file if
> the flag 'I' is selected on binfmt_misc. When that flag is set, the file
> /proc/self/exe_file points to the *interpreted* file, be it ELF or not.
> In order to allow users to distinguish if such flag is used or not without
> checking the binfmt_misc filesystem, we propose also the /proc/self/interpreter
> file, which always points to the *interpreter* in scenarios where
> interpretation is set, like binfmt_misc. This file is empty / points to
> nothing in the case of regular ELF execution, though we could consider
> implementing a way to point to the LD preloader if that makes sense...
> 
> This was sent as RFC because of course it's a very core change, affecting
> multiple areas and there are design choices (and questions) in each patch
> so we could discuss and check the best way to implement the solution as
> well as the corner cases handling. This is a very useful feature for
> emulators and such, like FEX and Wine, which usually need to circumvent
> this kernel limitation in order to expose the true emulated file name
> (more examples at [1][2][3]).
> 
> This patchset is based on the currently v6.6-rc1 candidate (Linus tree
> from yesterday) and was tested under QEMU as well as using FEX.
> Thanks in advance for comments, any feedback is greatly appreciated!
> Cheers,
> 
> Guilherme
> 
> 
> [0] https://github.com/FEX-Emu/FEX
> 
> [1] Using an environment variable trick to override exe_file:
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/u_process.c#L209 
> 
> [2] https://github.com/baldurk/renderdoc/pull/2694
> 
> [3] FEX handling of the exe_file parsing:
> https://github.com/FEX-Emu/FEX/blob/main/Source/Tools/FEXLoader/LinuxSyscalls/FileManagement.cpp#L499
> 
> 

Hi folks, gentle monthly ping.
Any opinions / suggestions on that?

Thanks in advance,


Guilherme
David Hildenbrand Oct. 6, 2023, 12:07 p.m. UTC | #2
On 07.09.23 22:24, Guilherme G. Piccoli wrote:
> Currently the kernel provides a symlink to the executable binary, in the
> form of procfs file exe_file (/proc/self/exe_file for example). But what
> happens in interpreted scenarios (like binfmt_misc) is that such link
> always points to the *interpreter*. For cases of Linux binary emulators,
> like FEX [0] for example, it's then necessary to somehow mask that and
> emulate the true binary path.

I'm absolutely no expert on that, but I'm wondering if, instead of 
modifying exe_file and adding an interpreter file, you'd want to leave 
exe_file alone and instead provide an easier way to obtain the 
interpreted file.

Can you maybe describe why modifying exe_file is desired (about which 
consumers are we worrying? ) and what exactly FEX does to handle that 
(how does it mask that?).

So a bit more background on the challenges without this change would be 
appreciated.
Kees Cook Oct. 9, 2023, 5:37 p.m. UTC | #3
On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
> > Currently the kernel provides a symlink to the executable binary, in the
> > form of procfs file exe_file (/proc/self/exe_file for example). But what
> > happens in interpreted scenarios (like binfmt_misc) is that such link
> > always points to the *interpreter*. For cases of Linux binary emulators,
> > like FEX [0] for example, it's then necessary to somehow mask that and
> > emulate the true binary path.
> 
> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
> exe_file and adding an interpreter file, you'd want to leave exe_file alone
> and instead provide an easier way to obtain the interpreted file.
> 
> Can you maybe describe why modifying exe_file is desired (about which
> consumers are we worrying? ) and what exactly FEX does to handle that (how
> does it mask that?).
> 
> So a bit more background on the challenges without this change would be
> appreciated.

Yeah, it sounds like you're dealing with a process that examines
/proc/self/exe_file for itself only to find the binfmt_misc interpreter
when it was run via binfmt_misc?

What actually breaks? Or rather, why does the process to examine
exe_file? I'm just trying to see if there are other solutions here that
would avoid creating an ambiguous interface...
Ryan Houdek Oct. 11, 2023, 11:53 p.m. UTC | #4
On Mon, Oct 9, 2023 at 10:37 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
> > On 07.09.23 22:24, Guilherme G. Piccoli wrote:
> > > Currently the kernel provides a symlink to the executable binary, in the
> > > form of procfs file exe_file (/proc/self/exe_file for example). But what
> > > happens in interpreted scenarios (like binfmt_misc) is that such link
> > > always points to the *interpreter*. For cases of Linux binary emulators,
> > > like FEX [0] for example, it's then necessary to somehow mask that and
> > > emulate the true binary path.
> >
> > I'm absolutely no expert on that, but I'm wondering if, instead of modifying
> > exe_file and adding an interpreter file, you'd want to leave exe_file alone
> > and instead provide an easier way to obtain the interpreted file.
> >
> > Can you maybe describe why modifying exe_file is desired (about which
> > consumers are we worrying? ) and what exactly FEX does to handle that (how
> > does it mask that?).
> >
> > So a bit more background on the challenges without this change would be
> > appreciated.
>
> Yeah, it sounds like you're dealing with a process that examines
> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
> when it was run via binfmt_misc?
>
> What actually breaks? Or rather, why does the process to examine
> exe_file? I'm just trying to see if there are other solutions here that
> would avoid creating an ambiguous interface...
>
> --
> Kees Cook

Hey there, FEX-Emu developer here. I can try and explain some of the issues.

First thing is that we should set the stage here that there is a
fundamental discrepancy
between how ELF interpreters are represented versus binfmt_misc
interpreters when it
comes to procfs exe. An ELF file today can either be static or dynamic, with the
dynamic ELF files having a program header called PT_INTERP which will tell the
kernel where its interpreter executable lives. In an x86-64 environment this
is likely to be something like /lib64/ld-linux-x86-64.so.2. Today, the Kernel
doesn't put the PT_INTERP handle into procfs exe, it instead uses the
dynamic ELF
that was originally launched.

In contrast to how this behaviour works, a binfmt_misc interpreter
file getting launched
through execve may or may not have ELF header sections. But it is left up to the
binfmt_misc handler to do whatever it may need. The kernel sets procfs
exe to the
binfmt_misc interpreter instead of the executable.

This is fundamentally the contrasting behaviour that is trying to be
improved. It seems
like the this behaviour is an oversight of the original binfmt_misc
implementation
rather than any sort of ambition to ensure there is a difference. It's
already ambiguous
that the interface changes when executing an executable through binfmt_misc.

Some simple ways applications break:
- Applications like chrome tend to relaunch themselves through execve
with `/proc/self/exe`
  - Chrome does this. I think Flatpaks or AppImage applications do this?
  - There are definitely more that do this that I have noticed.
- In the cover letter there was a link to Mesa, the OSS OpenGL/Vulkan
drivers using this
  - This library uses this interface to find out what application is
running for applying
     workarounds for application bugs. Plenty of historical
applications that use the API
     badly or incorrectly and need specific driver workarounds for them.
- Some applications may use this path to open their own executable path and then
   mmap back in for doing tricky memory mirroring or dynamic linking
of themselves.
   - Saw some old abandoned emulator software doing this.

There's likely more uses that I haven't noticed from software using
this interface.

Onward to what FEX-Emu is and how it tries working around the issue
with a fairly naive hack.
FEX-Emu is an x86 and x86-64 CPU emulator that gets installed as a
binfmt_misc interpreter.
It then executes x86 and x86-64 ELF files on an Arm64 device as
effectively a multi-arch
capable fashion. It's lightweight in that all application processes
and threads are just
regular Arm64 processes and threads. This is similar to how qemu-user operates.

When processing system calls, FEX will intercept any call that
consumes a pathname,
it will then inspect that path name and if it is one of the ways it is
possible to access
procfs/exe then it redirects to the true x86/x86-64 executable. This
is an attempt to behave
like how if the ELF was executed without a binfmt_misc handler.

Pathnames captured in FEX-Emu today:
- /proc/self/exe
- /proc/<pid>/exe
- /proc/thread-self/exe

This is very fragile and doesn't cover the full range of how
applications could access procfs.
Applications could end up using the *at variants of syscalls with an
FD that has /proc/self/
open. They could do simple tricks like `/proc/self/../self/exe` and it
would side-step this check.
It's a game of whack-a-mole and escalating overhead to try and close
the gap purely due
to, what appears to be, an oversight in how binfmt_misc and PT_INTERP
is handled.

Hopefully this explains why this is necessary and that reducing the
differences between
how PT_INTERP and binfmt_misc are represented is desired.
Guilherme G. Piccoli Nov. 13, 2023, 5:33 p.m. UTC | #5
On 09/10/2023 14:37, Kees Cook wrote:
> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>> Currently the kernel provides a symlink to the executable binary, in the
>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>> emulate the true binary path.
>>
>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>> and instead provide an easier way to obtain the interpreted file.
>>
>> Can you maybe describe why modifying exe_file is desired (about which
>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>> does it mask that?).
>>
>> So a bit more background on the challenges without this change would be
>> appreciated.
> 
> Yeah, it sounds like you're dealing with a process that examines
> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
> when it was run via binfmt_misc?
> 
> What actually breaks? Or rather, why does the process to examine
> exe_file? I'm just trying to see if there are other solutions here that
> would avoid creating an ambiguous interface...
> 

Thanks Kees and David! Did Ryan's thorough comment addressed your
questions? Do you have any take on the TODOs?

I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
But would be better having the TODOs addressed, I guess.

Thanks in advance for reviews and feedback on this.
Cheers,


Guilherme
Eric W. Biederman Nov. 13, 2023, 6:29 p.m. UTC | #6
"Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:

> On 09/10/2023 14:37, Kees Cook wrote:
>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>>> Currently the kernel provides a symlink to the executable binary, in the
>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>>> emulate the true binary path.
>>>
>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>>> and instead provide an easier way to obtain the interpreted file.
>>>
>>> Can you maybe describe why modifying exe_file is desired (about which
>>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>>> does it mask that?).
>>>
>>> So a bit more background on the challenges without this change would be
>>> appreciated.
>> 
>> Yeah, it sounds like you're dealing with a process that examines
>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
>> when it was run via binfmt_misc?
>> 
>> What actually breaks? Or rather, why does the process to examine
>> exe_file? I'm just trying to see if there are other solutions here that
>> would avoid creating an ambiguous interface...
>> 
>
> Thanks Kees and David! Did Ryan's thorough comment addressed your
> questions? Do you have any take on the TODOs?
>
> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
> But would be better having the TODOs addressed, I guess.

Currently there is a mechanism in the kernel for changing
/proc/self/exe.  Would that be reasonable to use in this case?

It came from the checkpoint/restart work, but given that it is already
implemented it seems like the path of least resistance to get your
binfmt_misc that wants to look like binfmt_elf to use that mechanism.

Eric
David Hildenbrand Nov. 13, 2023, 7:16 p.m. UTC | #7
On 13.11.23 19:29, Eric W. Biederman wrote:
> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
> 
>> On 09/10/2023 14:37, Kees Cook wrote:
>>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>>>> Currently the kernel provides a symlink to the executable binary, in the
>>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>>>> emulate the true binary path.
>>>>
>>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>>>> and instead provide an easier way to obtain the interpreted file.
>>>>
>>>> Can you maybe describe why modifying exe_file is desired (about which
>>>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>>>> does it mask that?).
>>>>
>>>> So a bit more background on the challenges without this change would be
>>>> appreciated.
>>>
>>> Yeah, it sounds like you're dealing with a process that examines
>>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
>>> when it was run via binfmt_misc?
>>>
>>> What actually breaks? Or rather, why does the process to examine
>>> exe_file? I'm just trying to see if there are other solutions here that
>>> would avoid creating an ambiguous interface...
>>>
>>
>> Thanks Kees and David! Did Ryan's thorough comment addressed your
>> questions? Do you have any take on the TODOs?
>>
>> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
>> But would be better having the TODOs addressed, I guess.
> 
> Currently there is a mechanism in the kernel for changing
> /proc/self/exe.  Would that be reasonable to use in this case?
> 
> It came from the checkpoint/restart work, but given that it is already
> implemented it seems like the path of least resistance to get your
> binfmt_misc that wants to look like binfmt_elf to use that mechanism.

I had that in mind as well, but 
prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable 
is still mmaped (due to denywrite handling); that should be the case for 
the emulator I strongly assume.
Guilherme G. Piccoli Nov. 13, 2023, 7:17 p.m. UTC | #8
On 13/11/2023 15:29, Eric W. Biederman wrote:
> [...]
> Currently there is a mechanism in the kernel for changing
> /proc/self/exe.  Would that be reasonable to use in this case?
> 
> It came from the checkpoint/restart work, but given that it is already
> implemented it seems like the path of least resistance to get your
> binfmt_misc that wants to look like binfmt_elf to use that mechanism.
> 
> Eric
> 

Thanks Eric! I'm curious on how that would work: we'd change the symlink
of the emulator? So, the *emulated* software, when reading that, would
see the correct symlink?

Also, just to fully clarify: are you suggesting we hook the new
binfmt_misc flag proposed here to the internal kernel way of changing
the proc/self/exe symlink, or are you suggesting we use the prctl() tune
from the emulator, like the userspace changing its own symlink?

One of the biggest concerns I have with this kind of approach is that
changing the symlink actually...changes it - the binary mapping itself,
I mean.
Whereas my way was a "fake" change, just expose one thing for the
emulated app, but changes nothing else...

Cheers,


Guilherme
Eric W. Biederman Nov. 14, 2023, 4:11 p.m. UTC | #9
David Hildenbrand <david@redhat.com> writes:

> On 13.11.23 19:29, Eric W. Biederman wrote:
>> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
>> 
>>> On 09/10/2023 14:37, Kees Cook wrote:
>>>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>>>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>>>>> Currently the kernel provides a symlink to the executable binary, in the
>>>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>>>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>>>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>>>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>>>>> emulate the true binary path.
>>>>>
>>>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>>>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>>>>> and instead provide an easier way to obtain the interpreted file.
>>>>>
>>>>> Can you maybe describe why modifying exe_file is desired (about which
>>>>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>>>>> does it mask that?).
>>>>>
>>>>> So a bit more background on the challenges without this change would be
>>>>> appreciated.
>>>>
>>>> Yeah, it sounds like you're dealing with a process that examines
>>>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
>>>> when it was run via binfmt_misc?
>>>>
>>>> What actually breaks? Or rather, why does the process to examine
>>>> exe_file? I'm just trying to see if there are other solutions here that
>>>> would avoid creating an ambiguous interface...
>>>>
>>>
>>> Thanks Kees and David! Did Ryan's thorough comment addressed your
>>> questions? Do you have any take on the TODOs?
>>>
>>> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
>>> But would be better having the TODOs addressed, I guess.
>> Currently there is a mechanism in the kernel for changing
>> /proc/self/exe.  Would that be reasonable to use in this case?
>> It came from the checkpoint/restart work, but given that it is
>> already
>> implemented it seems like the path of least resistance to get your
>> binfmt_misc that wants to look like binfmt_elf to use that mechanism.
>
> I had that in mind as well, but
> prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable
> is still mmaped (due to denywrite handling); that should be the case
> for the emulator I strongly assume.

Bah yes.  The sanity check that that the old executable is no longer
mapped does make it so that we can't trivially change the /proc/self/exe
using prctl(PR_SET_MM_EXE_FILE).

Eric
David Hildenbrand Nov. 14, 2023, 4:14 p.m. UTC | #10
On 14.11.23 17:11, Eric W. Biederman wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 13.11.23 19:29, Eric W. Biederman wrote:
>>> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
>>>
>>>> On 09/10/2023 14:37, Kees Cook wrote:
>>>>> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
>>>>>> On 07.09.23 22:24, Guilherme G. Piccoli wrote:
>>>>>>> Currently the kernel provides a symlink to the executable binary, in the
>>>>>>> form of procfs file exe_file (/proc/self/exe_file for example). But what
>>>>>>> happens in interpreted scenarios (like binfmt_misc) is that such link
>>>>>>> always points to the *interpreter*. For cases of Linux binary emulators,
>>>>>>> like FEX [0] for example, it's then necessary to somehow mask that and
>>>>>>> emulate the true binary path.
>>>>>>
>>>>>> I'm absolutely no expert on that, but I'm wondering if, instead of modifying
>>>>>> exe_file and adding an interpreter file, you'd want to leave exe_file alone
>>>>>> and instead provide an easier way to obtain the interpreted file.
>>>>>>
>>>>>> Can you maybe describe why modifying exe_file is desired (about which
>>>>>> consumers are we worrying? ) and what exactly FEX does to handle that (how
>>>>>> does it mask that?).
>>>>>>
>>>>>> So a bit more background on the challenges without this change would be
>>>>>> appreciated.
>>>>>
>>>>> Yeah, it sounds like you're dealing with a process that examines
>>>>> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
>>>>> when it was run via binfmt_misc?
>>>>>
>>>>> What actually breaks? Or rather, why does the process to examine
>>>>> exe_file? I'm just trying to see if there are other solutions here that
>>>>> would avoid creating an ambiguous interface...
>>>>>
>>>>
>>>> Thanks Kees and David! Did Ryan's thorough comment addressed your
>>>> questions? Do you have any take on the TODOs?
>>>>
>>>> I can maybe rebase against 6.7-rc1 and resubmit , if that makes sense!
>>>> But would be better having the TODOs addressed, I guess.
>>> Currently there is a mechanism in the kernel for changing
>>> /proc/self/exe.  Would that be reasonable to use in this case?
>>> It came from the checkpoint/restart work, but given that it is
>>> already
>>> implemented it seems like the path of least resistance to get your
>>> binfmt_misc that wants to look like binfmt_elf to use that mechanism.
>>
>> I had that in mind as well, but
>> prctl_set_mm_exe_file()->replace_mm_exe_file() fails if the executable
>> is still mmaped (due to denywrite handling); that should be the case
>> for the emulator I strongly assume.
> 
> Bah yes.  The sanity check that that the old executable is no longer
> mapped does make it so that we can't trivially change the /proc/self/exe
> using prctl(PR_SET_MM_EXE_FILE).

I was wondering if we should have a new file (yet have to come up witha 
fitting name) that defaults to /proc/self/exe as long as that new file 
doesn't explicitly get set via  a prctl.

So /proc/self/exe would indeed always show the emulator (executable), 
but the new file could be adjusted to something that is being executed 
by the emulator.

Just a thought ... I'd rather leave /proc/self/exe alone.