diff mbox series

binfmt: Don't consider riscv{32,64} part of the same family

Message ID 20241203094702.124748-1-abologna@redhat.com (mailing list archive)
State New, archived
Headers show
Series binfmt: Don't consider riscv{32,64} part of the same family | expand

Commit Message

Andrea Bolognani Dec. 3, 2024, 9:47 a.m. UTC
Currently the script won't generate a configuration file that
sets up qemu-user-riscv32 on riscv64, likely under the
assumption that 64-bit RISC-V machines can natively run 32-bit
RISC-V code.

However this functionality, while theoretically possible, in
practice is missing from most commonly available RISC-V hardware
and not enabled at the distro level. So qemu-user-riscv32 really
is the only option to run riscv32 binaries on riscv64.

Make riscv32 and riscv64 each its own family, so that the
configuration file we need to make 32-on-64 userspace emulation
work gets generated.

Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
Thanks: David Abdurachmanov <davidlt@rivosinc.com>
Thanks: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
---
 scripts/qemu-binfmt-conf.sh | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Philippe Mathieu-Daudé Dec. 3, 2024, 9:59 a.m. UTC | #1
Hi Andrea,

On 3/12/24 10:47, Andrea Bolognani wrote:
> Currently the script won't generate a configuration file that
> sets up qemu-user-riscv32 on riscv64, likely under the
> assumption that 64-bit RISC-V machines can natively run 32-bit

I'm confused by the "machines" description used for user emulation.

> RISC-V code.
> 
> However this functionality, while theoretically possible, in
> practice is missing from most commonly available RISC-V hardware
> and not enabled at the distro level. So qemu-user-riscv32 really
> is the only option to run riscv32 binaries on riscv64.

We have definitions such ELF_ARCH/ELF_PLATFORM/ELF_MACHINE to
parse ELF header and select the best CPU / flags. Maybe RISC-V
lacks them?

BTW we should expose that for linux-user as target_arch_elf.h,
like bsd-user does, that would reduce all these #ifdef'ry in
linux-user/elfload.c...

> 
> Make riscv32 and riscv64 each its own family, so that the
> configuration file we need to make 32-on-64 userspace emulation
> work gets generated.

Does this patch aim for 9.2? Otherwise FYI  I'm working on unifying
32/64-bit targets, maybe for 10.0...

> 
> Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> Thanks: David Abdurachmanov <davidlt@rivosinc.com>
> Thanks: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> ---
>   scripts/qemu-binfmt-conf.sh | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
> index 6ef9f118d9..e38b767c24 100755
> --- a/scripts/qemu-binfmt-conf.sh
> +++ b/scripts/qemu-binfmt-conf.sh
> @@ -110,11 +110,11 @@ hppa_family=hppa
>   
>   riscv32_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xf3\x00'
>   riscv32_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> -riscv32_family=riscv
> +riscv32_family=riscv32
>   
>   riscv64_magic='\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xf3\x00'
>   riscv64_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> -riscv64_family=riscv
> +riscv64_family=riscv64
>   
>   xtensa_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x5e\x00'
>   xtensa_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> @@ -168,9 +168,6 @@ qemu_get_family() {
>       sparc*)
>           echo "sparc"
>           ;;
> -    riscv*)
> -        echo "riscv"
> -        ;;
>       loongarch*)
>           echo "loongarch"
>           ;;
Andrea Bolognani Dec. 3, 2024, 10:12 a.m. UTC | #2
On Tue, Dec 03, 2024 at 10:59:24AM +0100, Philippe Mathieu-Daudé wrote:
> On 3/12/24 10:47, Andrea Bolognani wrote:
> > Currently the script won't generate a configuration file that
> > sets up qemu-user-riscv32 on riscv64, likely under the
> > assumption that 64-bit RISC-V machines can natively run 32-bit
>
> I'm confused by the "machines" description used for user emulation.

I meant it in the sense of physical machines. I can use the word
"hosts" instead if you think that's less ambiguous.

> > However this functionality, while theoretically possible, in
> > practice is missing from most commonly available RISC-V hardware
> > and not enabled at the distro level. So qemu-user-riscv32 really
> > is the only option to run riscv32 binaries on riscv64.
>
> We have definitions such ELF_ARCH/ELF_PLATFORM/ELF_MACHINE to
> parse ELF header and select the best CPU / flags. Maybe RISC-V
> lacks them?
>
> BTW we should expose that for linux-user as target_arch_elf.h,
> like bsd-user does, that would reduce all these #ifdef'ry in
> linux-user/elfload.c...

All of this is flying way over my head, sorry :)

qemu-user-riscv32 already works great on riscv64 as far as I can
tell. I tested it by chrooting into a riscv32 Gentoo rootfs from a
riscv64 Fedora installation. We just need the configuration file to
be generated.

> > Make riscv32 and riscv64 each its own family, so that the
> > configuration file we need to make 32-on-64 userspace emulation
> > work gets generated.
>
> Does this patch aim for 9.2? Otherwise FYI  I'm working on unifying
> 32/64-bit targets, maybe for 10.0...

Having this in 9.2 would be great.
Daniel P. Berrangé Dec. 3, 2024, 10:18 a.m. UTC | #3
On Tue, Dec 03, 2024 at 10:59:24AM +0100, Philippe Mathieu-Daudé wrote:
> Hi Andrea,
> 
> On 3/12/24 10:47, Andrea Bolognani wrote:
> > Currently the script won't generate a configuration file that
> > sets up qemu-user-riscv32 on riscv64, likely under the
> > assumption that 64-bit RISC-V machines can natively run 32-bit
> 
> I'm confused by the "machines" description used for user emulation.

It is referring to the host machines, being able (or not) to
run 32-bit usermode code on 64-bit host kernel.

> 
> > RISC-V code.
> > 
> > However this functionality, while theoretically possible, in
> > practice is missing from most commonly available RISC-V hardware
> > and not enabled at the distro level. So qemu-user-riscv32 really
> > is the only option to run riscv32 binaries on riscv64.
> 
> We have definitions such ELF_ARCH/ELF_PLATFORM/ELF_MACHINE to
> parse ELF header and select the best CPU / flags. Maybe RISC-V
> lacks them?

Is that relevant, as we're not runing QEMU code at all in
the problematic scenario ?

Currently the script below will skip generating a binfmt
rule for riscv32, when on a riscv64 host. Thus qemu-riscv32
will never get called, and the kernel will try & fail to
run riscv32 binaries natively.

This change would make us generate riscv32 binfmt rules
and thus use qemu-riscv32 on riscv64 hosts for linux-user.


Separatley this from patch, we should also consider whether
it is time to do the same for aarch64/arm7.

If I look at this page:

  https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html

and sort by 'announced' to see msot recent CPUs first, then
almost all of them have "NO" in the "aarch32 support" column.

IOW, on modern aarch64 CPUs, qemu-arm is the only viable way
to run 32-bit usermode binaries AFAICT, and suggests we ought
to be creating a binfmt rule for that on aarch64 hosts.

> BTW we should expose that for linux-user as target_arch_elf.h,
> like bsd-user does, that would reduce all these #ifdef'ry in
> linux-user/elfload.c...
> 
> > 
> > Make riscv32 and riscv64 each its own family, so that the
> > configuration file we need to make 32-on-64 userspace emulation
> > work gets generated.
> 
> Does this patch aim for 9.2? Otherwise FYI  I'm working on unifying
> 32/64-bit targets, maybe for 10.0...

Well in Fedora we'll backport it to 9.2 at least, and from that
POV I'd consider it stable-9.2 material if accepted here.

> > Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> > Thanks: David Abdurachmanov <davidlt@rivosinc.com>
> > Thanks: Daniel P. Berrangé <berrange@redhat.com>
> > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> > ---
> >   scripts/qemu-binfmt-conf.sh | 7 ++-----
> >   1 file changed, 2 insertions(+), 5 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

> > 
> > diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
> > index 6ef9f118d9..e38b767c24 100755
> > --- a/scripts/qemu-binfmt-conf.sh
> > +++ b/scripts/qemu-binfmt-conf.sh
> > @@ -110,11 +110,11 @@ hppa_family=hppa
> >   riscv32_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xf3\x00'
> >   riscv32_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> > -riscv32_family=riscv
> > +riscv32_family=riscv32
> >   riscv64_magic='\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xf3\x00'
> >   riscv64_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> > -riscv64_family=riscv
> > +riscv64_family=riscv64
> >   xtensa_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x5e\x00'
> >   xtensa_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> > @@ -168,9 +168,6 @@ qemu_get_family() {
> >       sparc*)
> >           echo "sparc"
> >           ;;
> > -    riscv*)
> > -        echo "riscv"
> > -        ;;
> >       loongarch*)
> >           echo "loongarch"
> >           ;;
> 

With regards,
Daniel
Peter Maydell Dec. 3, 2024, 10:35 a.m. UTC | #4
On Tue, 3 Dec 2024 at 10:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
> Separatley this from patch, we should also consider whether
> it is time to do the same for aarch64/arm7.
>
> If I look at this page:
>
>   https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html
>
> and sort by 'announced' to see msot recent CPUs first, then
> almost all of them have "NO" in the "aarch32 support" column.
>
> IOW, on modern aarch64 CPUs, qemu-arm is the only viable way
> to run 32-bit usermode binaries AFAICT, and suggests we ought
> to be creating a binfmt rule for that on aarch64 hosts.

What happens if you have a host CPU that *does* support 32-bit
natively and you also register the binfmt rule? Does the
host kernel prefer to execute natively or does it invoke
QEMU? I don't think we want to roll out something that
silently downgrades native execution to emulation...

thanks
-- PMM
Richard Henderson Dec. 3, 2024, 1:57 p.m. UTC | #5
On 12/3/24 04:35, Peter Maydell wrote:
> On Tue, 3 Dec 2024 at 10:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
>> Separatley this from patch, we should also consider whether
>> it is time to do the same for aarch64/arm7.
>>
>> If I look at this page:
>>
>>    https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html
>>
>> and sort by 'announced' to see msot recent CPUs first, then
>> almost all of them have "NO" in the "aarch32 support" column.
>>
>> IOW, on modern aarch64 CPUs, qemu-arm is the only viable way
>> to run 32-bit usermode binaries AFAICT, and suggests we ought
>> to be creating a binfmt rule for that on aarch64 hosts.
> 
> What happens if you have a host CPU that *does* support 32-bit
> natively and you also register the binfmt rule? Does the
> host kernel prefer to execute natively or does it invoke
> QEMU? I don't think we want to roll out something that
> silently downgrades native execution to emulation...

The registered rule applies and the kernel invokes qemu.


r~
Laurent Vivier Dec. 4, 2024, 10:03 a.m. UTC | #6
Le 03/12/2024 à 10:47, Andrea Bolognani a écrit :
> Currently the script won't generate a configuration file that
> sets up qemu-user-riscv32 on riscv64, likely under the
> assumption that 64-bit RISC-V machines can natively run 32-bit
> RISC-V code.
> 
> However this functionality, while theoretically possible, in
> practice is missing from most commonly available RISC-V hardware
> and not enabled at the distro level. So qemu-user-riscv32 really
> is the only option to run riscv32 binaries on riscv64.
> 
> Make riscv32 and riscv64 each its own family, so that the
> configuration file we need to make 32-on-64 userspace emulation
> work gets generated.
> 
> Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> Thanks: David Abdurachmanov <davidlt@rivosinc.com>
> Thanks: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> ---
>   scripts/qemu-binfmt-conf.sh | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
> index 6ef9f118d9..e38b767c24 100755
> --- a/scripts/qemu-binfmt-conf.sh
> +++ b/scripts/qemu-binfmt-conf.sh
> @@ -110,11 +110,11 @@ hppa_family=hppa
>   
>   riscv32_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xf3\x00'
>   riscv32_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> -riscv32_family=riscv
> +riscv32_family=riscv32
>   
>   riscv64_magic='\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xf3\x00'
>   riscv64_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> -riscv64_family=riscv
> +riscv64_family=riscv64
>   
>   xtensa_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x5e\x00'
>   xtensa_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
> @@ -168,9 +168,6 @@ qemu_get_family() {
>       sparc*)
>           echo "sparc"
>           ;;
> -    riscv*)
> -        echo "riscv"
> -        ;;
>       loongarch*)
>           echo "loongarch"
>           ;;

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Daniel P. Berrangé Dec. 4, 2024, 10:17 a.m. UTC | #7
On Tue, Dec 03, 2024 at 07:57:14AM -0600, Richard Henderson wrote:
> On 12/3/24 04:35, Peter Maydell wrote:
> > On Tue, 3 Dec 2024 at 10:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > Separatley this from patch, we should also consider whether
> > > it is time to do the same for aarch64/arm7.
> > > 
> > > If I look at this page:
> > > 
> > >    https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html
> > > 
> > > and sort by 'announced' to see msot recent CPUs first, then
> > > almost all of them have "NO" in the "aarch32 support" column.
> > > 
> > > IOW, on modern aarch64 CPUs, qemu-arm is the only viable way
> > > to run 32-bit usermode binaries AFAICT, and suggests we ought
> > > to be creating a binfmt rule for that on aarch64 hosts.
> > 
> > What happens if you have a host CPU that *does* support 32-bit
> > natively and you also register the binfmt rule? Does the
> > host kernel prefer to execute natively or does it invoke
> > QEMU? I don't think we want to roll out something that
> > silently downgrades native execution to emulation...
> 
> The registered rule applies and the kernel invokes qemu.

This is all quiet difficult from a distro POV, but not QEMU's fault.

We want to install the binfmt rules in a way that we "do the right thing"
regardless of hardware out of the box.

The systemd logic for loading binfmt rules is unconditional, loading
everything from /usr/lib/binfmt.d, but we need a way to make things
conditional on the lack of support for aarch32 on the currently running
platform.

With regards,
Daniel
Laurent Vivier Dec. 5, 2024, 5:15 p.m. UTC | #8
Le 04/12/2024 à 11:17, Daniel P. Berrangé a écrit :
> On Tue, Dec 03, 2024 at 07:57:14AM -0600, Richard Henderson wrote:
>> On 12/3/24 04:35, Peter Maydell wrote:
>>> On Tue, 3 Dec 2024 at 10:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>> Separatley this from patch, we should also consider whether
>>>> it is time to do the same for aarch64/arm7.
>>>>
>>>> If I look at this page:
>>>>
>>>>     https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html
>>>>
>>>> and sort by 'announced' to see msot recent CPUs first, then
>>>> almost all of them have "NO" in the "aarch32 support" column.
>>>>
>>>> IOW, on modern aarch64 CPUs, qemu-arm is the only viable way
>>>> to run 32-bit usermode binaries AFAICT, and suggests we ought
>>>> to be creating a binfmt rule for that on aarch64 hosts.
>>>
>>> What happens if you have a host CPU that *does* support 32-bit
>>> natively and you also register the binfmt rule? Does the
>>> host kernel prefer to execute natively or does it invoke
>>> QEMU? I don't think we want to roll out something that
>>> silently downgrades native execution to emulation...
>>
>> The registered rule applies and the kernel invokes qemu.
> 
> This is all quiet difficult from a distro POV, but not QEMU's fault.
> 
> We want to install the binfmt rules in a way that we "do the right thing"
> regardless of hardware out of the box.
> 
> The systemd logic for loading binfmt rules is unconditional, loading
> everything from /usr/lib/binfmt.d, but we need a way to make things
> conditional on the lack of support for aarch32 on the currently running
> platform.

Now, there is another alternative: binfmt_misc is now part of user namespace, so you can define a 
binfmt rule for a given namespace:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=21ca59b365c091d583f36ac753eaa8baf947be6f

I have added a new parameter to unshare to help to run a new namespace with binfmt_misc:

https://github.com/util-linux/util-linux/commit/9d55de0d0d5c6298b38b58c3f4dc876c56213f85

Thanks,
Laurent
Andrea Bolognani Jan. 2, 2025, 4:02 p.m. UTC | #9
On Tue, Dec 03, 2024 at 10:47:02AM +0100, Andrea Bolognani wrote:
> Currently the script won't generate a configuration file that
> sets up qemu-user-riscv32 on riscv64, likely under the
> assumption that 64-bit RISC-V machines can natively run 32-bit
> RISC-V code.
>
> However this functionality, while theoretically possible, in
> practice is missing from most commonly available RISC-V hardware
> and not enabled at the distro level. So qemu-user-riscv32 really
> is the only option to run riscv32 binaries on riscv64.
>
> Make riscv32 and riscv64 each its own family, so that the
> configuration file we need to make 32-on-64 userspace emulation
> work gets generated.
>
> Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> Thanks: David Abdurachmanov <davidlt@rivosinc.com>
> Thanks: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> ---
>  scripts/qemu-binfmt-conf.sh | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)

ping

There are already two ACKs so I think we just need a maintainer to
pick this up.
Alistair Francis Jan. 6, 2025, 1:27 a.m. UTC | #10
On Fri, Jan 3, 2025 at 2:04 AM Andrea Bolognani <abologna@redhat.com> wrote:
>
> On Tue, Dec 03, 2024 at 10:47:02AM +0100, Andrea Bolognani wrote:
> > Currently the script won't generate a configuration file that
> > sets up qemu-user-riscv32 on riscv64, likely under the
> > assumption that 64-bit RISC-V machines can natively run 32-bit
> > RISC-V code.
> >
> > However this functionality, while theoretically possible, in
> > practice is missing from most commonly available RISC-V hardware
> > and not enabled at the distro level. So qemu-user-riscv32 really
> > is the only option to run riscv32 binaries on riscv64.
> >
> > Make riscv32 and riscv64 each its own family, so that the
> > configuration file we need to make 32-on-64 userspace emulation
> > work gets generated.
> >
> > Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> > Thanks: David Abdurachmanov <davidlt@rivosinc.com>
> > Thanks: Daniel P. Berrangé <berrange@redhat.com>
> > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> > ---
> >  scripts/qemu-binfmt-conf.sh | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
>
> ping
>
> There are already two ACKs so I think we just need a maintainer to
> pick this up.

We didn't get an answer to the issue of a CPU supporting RV32 and yet
the kernel still calls QEMU.

I understand this allows things to work out of the box, but seems like
a disservice to any hardware that does support RV32

Alistair

>
> --
> Andrea Bolognani / Red Hat / Virtualization
>
>
Peter Maydell Jan. 6, 2025, 11:47 a.m. UTC | #11
On Mon, 6 Jan 2025 at 01:29, Alistair Francis <alistair23@gmail.com> wrote:
>
> On Fri, Jan 3, 2025 at 2:04 AM Andrea Bolognani <abologna@redhat.com> wrote:
> >
> > On Tue, Dec 03, 2024 at 10:47:02AM +0100, Andrea Bolognani wrote:
> > > Currently the script won't generate a configuration file that
> > > sets up qemu-user-riscv32 on riscv64, likely under the
> > > assumption that 64-bit RISC-V machines can natively run 32-bit
> > > RISC-V code.
> > >
> > > However this functionality, while theoretically possible, in
> > > practice is missing from most commonly available RISC-V hardware
> > > and not enabled at the distro level. So qemu-user-riscv32 really
> > > is the only option to run riscv32 binaries on riscv64.
> > >
> > > Make riscv32 and riscv64 each its own family, so that the
> > > configuration file we need to make 32-on-64 userspace emulation
> > > work gets generated.
> > >
> > > Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> > > Thanks: David Abdurachmanov <davidlt@rivosinc.com>
> > > Thanks: Daniel P. Berrangé <berrange@redhat.com>
> > > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> > > ---
> > >  scripts/qemu-binfmt-conf.sh | 7 ++-----
> > >  1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > ping
> >
> > There are already two ACKs so I think we just need a maintainer to
> > pick this up.
>
> We didn't get an answer to the issue of a CPU supporting RV32 and yet
> the kernel still calls QEMU.
>
> I understand this allows things to work out of the box, but seems like
> a disservice to any hardware that does support RV32

There's the same thing on Arm too -- we don't set up qemu-user
aarch32 binfmt-misc on an aarch64 system because the host might
be able to natively execute the aarch32 binary. This is becoming
less true, but we still don't want to silently downgrade
native execution to emulation on the hosts where native execution
used to work.

I'm not sure the best approach here -- ideally we would want to
be able to register a binfmt-misc to the host kernel with "only use
this if you could not already natively execute it", but AFAIK that's
not possible.

-- PMM
Daniel P. Berrangé Jan. 6, 2025, 11:57 a.m. UTC | #12
On Mon, Jan 06, 2025 at 11:47:00AM +0000, Peter Maydell wrote:
> On Mon, 6 Jan 2025 at 01:29, Alistair Francis <alistair23@gmail.com> wrote:
> >
> > On Fri, Jan 3, 2025 at 2:04 AM Andrea Bolognani <abologna@redhat.com> wrote:
> > >
> > > On Tue, Dec 03, 2024 at 10:47:02AM +0100, Andrea Bolognani wrote:
> > > > Currently the script won't generate a configuration file that
> > > > sets up qemu-user-riscv32 on riscv64, likely under the
> > > > assumption that 64-bit RISC-V machines can natively run 32-bit
> > > > RISC-V code.
> > > >
> > > > However this functionality, while theoretically possible, in
> > > > practice is missing from most commonly available RISC-V hardware
> > > > and not enabled at the distro level. So qemu-user-riscv32 really
> > > > is the only option to run riscv32 binaries on riscv64.
> > > >
> > > > Make riscv32 and riscv64 each its own family, so that the
> > > > configuration file we need to make 32-on-64 userspace emulation
> > > > work gets generated.
> > > >
> > > > Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> > > > Thanks: David Abdurachmanov <davidlt@rivosinc.com>
> > > > Thanks: Daniel P. Berrangé <berrange@redhat.com>
> > > > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> > > > ---
> > > >  scripts/qemu-binfmt-conf.sh | 7 ++-----
> > > >  1 file changed, 2 insertions(+), 5 deletions(-)
> > >
> > > ping
> > >
> > > There are already two ACKs so I think we just need a maintainer to
> > > pick this up.
> >
> > We didn't get an answer to the issue of a CPU supporting RV32 and yet
> > the kernel still calls QEMU.
> >
> > I understand this allows things to work out of the box, but seems like
> > a disservice to any hardware that does support RV32
> 
> There's the same thing on Arm too -- we don't set up qemu-user
> aarch32 binfmt-misc on an aarch64 system because the host might
> be able to natively execute the aarch32 binary. This is becoming
> less true, but we still don't want to silently downgrade
> native execution to emulation on the hosts where native execution
> used to work.

Arm is a bigger problem as historically there genuinely was a
non-trivial set of CPUs with 32-on-64 support in HW.

IIUC, the riscv situation is much less likely to be a real problem

> I'm not sure the best approach here -- ideally we would want to
> be able to register a binfmt-misc to the host kernel with "only use
> this if you could not already natively execute it", but AFAIK that's
> not possible.

The other thing is that qemu-binfmt-conf.sh is not really the right
place to decide this, as we can't assume it is being run on the machine
that QEMU will be deployed on. eg in distro case, qemu-binfmt-conf.sh
may be run in a build farm to statically generate files.

Any conditional loading of binfmt rules would required extra magic to be
implemented by systemd, or would have to be done by the user selectively
installing different packages to omit the binfmt rules they don't want.

As a immediate bandaid, I'd suggest that qemu-binfmt-conf.sh could keep
its current logic as the default, and have a switch "--32-on-64" [1] to
tell it to generate the binfmt for 32-bit arch, even if 64-bit arch
could have 32-bit support.

Distros/users could then choose whether to pass --32-on-64 when statically
generating the binfmt files.

With regards,
Daniel

[1] better names welcome
Peter Maydell Jan. 6, 2025, 12:01 p.m. UTC | #13
On Mon, 6 Jan 2025 at 11:58, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, Jan 06, 2025 at 11:47:00AM +0000, Peter Maydell wrote:
> > On Mon, 6 Jan 2025 at 01:29, Alistair Francis <alistair23@gmail.com> wrote:
> > >
> > > On Fri, Jan 3, 2025 at 2:04 AM Andrea Bolognani <abologna@redhat.com> wrote:
> > > >
> > > > On Tue, Dec 03, 2024 at 10:47:02AM +0100, Andrea Bolognani wrote:
> > > > > Currently the script won't generate a configuration file that
> > > > > sets up qemu-user-riscv32 on riscv64, likely under the
> > > > > assumption that 64-bit RISC-V machines can natively run 32-bit
> > > > > RISC-V code.
> > > > >
> > > > > However this functionality, while theoretically possible, in
> > > > > practice is missing from most commonly available RISC-V hardware
> > > > > and not enabled at the distro level. So qemu-user-riscv32 really
> > > > > is the only option to run riscv32 binaries on riscv64.
> > > > >
> > > > > Make riscv32 and riscv64 each its own family, so that the
> > > > > configuration file we need to make 32-on-64 userspace emulation
> > > > > work gets generated.
> > > > >
> > > > > Link: https://src.fedoraproject.org/rpms/qemu/pull-request/72
> > > > > Thanks: David Abdurachmanov <davidlt@rivosinc.com>
> > > > > Thanks: Daniel P. Berrangé <berrange@redhat.com>
> > > > > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
> > > > > ---
> > > > >  scripts/qemu-binfmt-conf.sh | 7 ++-----
> > > > >  1 file changed, 2 insertions(+), 5 deletions(-)
> > > >
> > > > ping
> > > >
> > > > There are already two ACKs so I think we just need a maintainer to
> > > > pick this up.
> > >
> > > We didn't get an answer to the issue of a CPU supporting RV32 and yet
> > > the kernel still calls QEMU.
> > >
> > > I understand this allows things to work out of the box, but seems like
> > > a disservice to any hardware that does support RV32
> >
> > There's the same thing on Arm too -- we don't set up qemu-user
> > aarch32 binfmt-misc on an aarch64 system because the host might
> > be able to natively execute the aarch32 binary. This is becoming
> > less true, but we still don't want to silently downgrade
> > native execution to emulation on the hosts where native execution
> > used to work.
>
> Arm is a bigger problem as historically there genuinely was a
> non-trivial set of CPUs with 32-on-64 support in HW.
>
> IIUC, the riscv situation is much less likely to be a real problem
>
> > I'm not sure the best approach here -- ideally we would want to
> > be able to register a binfmt-misc to the host kernel with "only use
> > this if you could not already natively execute it", but AFAIK that's
> > not possible.
>
> The other thing is that qemu-binfmt-conf.sh is not really the right
> place to decide this, as we can't assume it is being run on the machine
> that QEMU will be deployed on. eg in distro case, qemu-binfmt-conf.sh
> may be run in a build farm to statically generate files.
>
> Any conditional loading of binfmt rules would required extra magic to be
> implemented by systemd, or would have to be done by the user selectively
> installing different packages to omit the binfmt rules they don't want.

If the kernel supported this via a binfmt flag, it wouldn't need
systemd-specific magic, user intervention, or for qemu-binfmt-conf.sh
to be running on the target machine. (But of course there's the "does
anybody care enough to implement that" problem plus the long delay of
actually deploying kernels that know about the flag...)

thanks
-- PMM
Andrea Bolognani Jan. 6, 2025, 5:54 p.m. UTC | #14
On Mon, Jan 06, 2025 at 11:57:58AM +0000, Daniel P. Berrangé wrote:
> On Mon, Jan 06, 2025 at 11:47:00AM +0000, Peter Maydell wrote:
> > On Mon, 6 Jan 2025 at 01:29, Alistair Francis <alistair23@gmail.com> wrote:
> > > We didn't get an answer to the issue of a CPU supporting RV32 and yet
> > > the kernel still calls QEMU.
> > >
> > > I understand this allows things to work out of the box, but seems like
> > > a disservice to any hardware that does support RV32
> >
> > There's the same thing on Arm too -- we don't set up qemu-user
> > aarch32 binfmt-misc on an aarch64 system because the host might
> > be able to natively execute the aarch32 binary. This is becoming
> > less true, but we still don't want to silently downgrade
> > native execution to emulation on the hosts where native execution
> > used to work.
>
> Arm is a bigger problem as historically there genuinely was a
> non-trivial set of CPUs with 32-on-64 support in HW.
>
> IIUC, the riscv situation is much less likely to be a real problem

Exactly.

My understanding is that, while 64-bit RISC-V CPUs that can natively
run 32-bit applications are theoretically possible, no such CPU
actually exists right now.

Even if it did exist, distros would have to set up things to support
this scenario, which they don't.

So in the current situation we're effectively making it impossible to
run riscv32 binaries on riscv64 for the benefit of a hypotetical
scenario.

> As a immediate bandaid, I'd suggest that qemu-binfmt-conf.sh could keep
> its current logic as the default, and have a switch "--32-on-64" [1] to
> tell it to generate the binfmt for 32-bit arch, even if 64-bit arch
> could have 32-bit support.
>
> Distros/users could then choose whether to pass --32-on-64 when statically
> generating the binfmt files.

While I'm still convinced that this patch could be safely applied
as-is, I'd be happy to go with your proposed approach if doing so
would help move things forward.
Alistair Francis Jan. 7, 2025, 1:29 a.m. UTC | #15
On Tue, Jan 7, 2025 at 3:54 AM Andrea Bolognani <abologna@redhat.com> wrote:
>
> On Mon, Jan 06, 2025 at 11:57:58AM +0000, Daniel P. Berrangé wrote:
> > On Mon, Jan 06, 2025 at 11:47:00AM +0000, Peter Maydell wrote:
> > > On Mon, 6 Jan 2025 at 01:29, Alistair Francis <alistair23@gmail.com> wrote:
> > > > We didn't get an answer to the issue of a CPU supporting RV32 and yet
> > > > the kernel still calls QEMU.
> > > >
> > > > I understand this allows things to work out of the box, but seems like
> > > > a disservice to any hardware that does support RV32
> > >
> > > There's the same thing on Arm too -- we don't set up qemu-user
> > > aarch32 binfmt-misc on an aarch64 system because the host might
> > > be able to natively execute the aarch32 binary. This is becoming
> > > less true, but we still don't want to silently downgrade
> > > native execution to emulation on the hosts where native execution
> > > used to work.
> >
> > Arm is a bigger problem as historically there genuinely was a
> > non-trivial set of CPUs with 32-on-64 support in HW.
> >
> > IIUC, the riscv situation is much less likely to be a real problem
>
> Exactly.
>
> My understanding is that, while 64-bit RISC-V CPUs that can natively
> run 32-bit applications are theoretically possible, no such CPU
> actually exists right now.

I do think T-HEAD are working on CPUs that do that though

>
> Even if it did exist, distros would have to set up things to support
> this scenario, which they don't.

Fair point

>
> So in the current situation we're effectively making it impossible to
> run riscv32 binaries on riscv64 for the benefit of a hypotetical
> scenario.

My worry is that in the future there is hardware that can do this and
we are stuck with this decision.

It does seem unlikely that lots of hardware will start supporting RV32

>
> > As a immediate bandaid, I'd suggest that qemu-binfmt-conf.sh could keep
> > its current logic as the default, and have a switch "--32-on-64" [1] to
> > tell it to generate the binfmt for 32-bit arch, even if 64-bit arch
> > could have 32-bit support.
> >
> > Distros/users could then choose whether to pass --32-on-64 when statically
> > generating the binfmt files.
>
> While I'm still convinced that this patch could be safely applied
> as-is, I'd be happy to go with your proposed approach if doing so
> would help move things forward.

That might be the best step, that way we allow distros to decide

Alistair

>
> --
> Andrea Bolognani / Red Hat / Virtualization
>
diff mbox series

Patch

diff --git a/scripts/qemu-binfmt-conf.sh b/scripts/qemu-binfmt-conf.sh
index 6ef9f118d9..e38b767c24 100755
--- a/scripts/qemu-binfmt-conf.sh
+++ b/scripts/qemu-binfmt-conf.sh
@@ -110,11 +110,11 @@  hppa_family=hppa
 
 riscv32_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xf3\x00'
 riscv32_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
-riscv32_family=riscv
+riscv32_family=riscv32
 
 riscv64_magic='\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xf3\x00'
 riscv64_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
-riscv64_family=riscv
+riscv64_family=riscv64
 
 xtensa_magic='\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x5e\x00'
 xtensa_mask='\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff'
@@ -168,9 +168,6 @@  qemu_get_family() {
     sparc*)
         echo "sparc"
         ;;
-    riscv*)
-        echo "riscv"
-        ;;
     loongarch*)
         echo "loongarch"
         ;;