diff mbox series

[v3,4/4] ARM: Implement PAN for LPAE by TTBR0 page table walks disablement

Message ID 20240312-arm32-lpae-pan-v3-4-532647afcd38@linaro.org (mailing list archive)
State New, archived
Headers show
Series PAN for ARM32 using LPAE | expand

Commit Message

Linus Walleij March 12, 2024, 12:52 p.m. UTC
From: Catalin Marinas <catalin.marinas@arm.com>

With LPAE enabled, privileged no-access cannot be enforced using CPU
domains as such feature is not available. This patch implements PAN
by disabling TTBR0 page table walks while in kernel mode.

The ARM architecture allows page table walks to be split between TTBR0
and TTBR1. With LPAE enabled, the split is defined by a combination of
TTBCR T0SZ and T1SZ bits. Currently, an LPAE-enabled kernel uses TTBR0
for user addresses and TTBR1 for kernel addresses with the VMSPLIT_2G
and VMSPLIT_3G configurations. The main advantage for the 3:1 split is
that TTBR1 is reduced to 2 levels, so potentially faster TLB refill
(though usually the first level entries are already cached in the TLB).

The PAN support on LPAE-enabled kernels uses TTBR0 when running in user
space or in kernel space during user access routines (TTBCR T0SZ and
T1SZ are both 0). When running user accesses are disabled in kernel
mode, TTBR0 page table walks are disabled by setting TTBCR.EPD0. TTBR1
is used for kernel accesses (including loadable modules; anything
covered by swapper_pg_dir) by reducing the TTBCR.T0SZ to the minimum
(2^(32-7) = 32MB). To avoid user accesses potentially hitting stale TLB
entries, the ASID is switched to 0 (reserved) by setting TTBCR.A1 and
using the ASID value in TTBR1. The difference from a non-PAN kernel is
that with the 3:1 memory split, TTBR1 always uses 3 levels of page
tables.

As part of the change we are using preprocessor elif definied() clauses
so balance these clauses by converting relevant precedingt ifdef
clauses to if defined() clauses.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
ChangeLog v2->v3:
- Drop leftover uaccess_disabled() stub.
- Consistently change ifdef to if defined() in this patch instead
  of the previous patch.
- Convert a missing ifdef over to if defined() for consistency.
ChangeLog v1->v2:
- Make the SVC mode TTBCR a separate field in struct svc_pt_regs
  as requested by Russell.
- Push the MM page fault permission check into a local function
  and avoid the too generic uaccess_disabled() as requested by Ard.
---
 arch/arm/Kconfig                            | 22 +++++++++++++--
 arch/arm/include/asm/assembler.h            |  1 +
 arch/arm/include/asm/pgtable-3level-hwdef.h |  9 ++++++
 arch/arm/include/asm/ptrace.h               |  1 +
 arch/arm/include/asm/uaccess-asm.h          | 44 ++++++++++++++++++++++++++++-
 arch/arm/include/asm/uaccess.h              | 26 ++++++++++++++++-
 arch/arm/kernel/asm-offsets.c               |  1 +
 arch/arm/kernel/suspend.c                   |  8 ++++++
 arch/arm/lib/csumpartialcopyuser.S          | 20 ++++++++++++-
 arch/arm/mm/fault.c                         | 29 +++++++++++++++++++
 10 files changed, 155 insertions(+), 6 deletions(-)

Comments

Geert Uytterhoeven May 7, 2024, 1:10 p.m. UTC | #1
Hi Linus,

On Tue, Mar 12, 2024 at 1:52 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
>
> With LPAE enabled, privileged no-access cannot be enforced using CPU
> domains as such feature is not available. This patch implements PAN
> by disabling TTBR0 page table walks while in kernel mode.
>
> The ARM architecture allows page table walks to be split between TTBR0
> and TTBR1. With LPAE enabled, the split is defined by a combination of
> TTBCR T0SZ and T1SZ bits. Currently, an LPAE-enabled kernel uses TTBR0
> for user addresses and TTBR1 for kernel addresses with the VMSPLIT_2G
> and VMSPLIT_3G configurations. The main advantage for the 3:1 split is
> that TTBR1 is reduced to 2 levels, so potentially faster TLB refill
> (though usually the first level entries are already cached in the TLB).
>
> The PAN support on LPAE-enabled kernels uses TTBR0 when running in user
> space or in kernel space during user access routines (TTBCR T0SZ and
> T1SZ are both 0). When running user accesses are disabled in kernel
> mode, TTBR0 page table walks are disabled by setting TTBCR.EPD0. TTBR1
> is used for kernel accesses (including loadable modules; anything
> covered by swapper_pg_dir) by reducing the TTBCR.T0SZ to the minimum
> (2^(32-7) = 32MB). To avoid user accesses potentially hitting stale TLB
> entries, the ASID is switched to 0 (reserved) by setting TTBCR.A1 and
> using the ASID value in TTBR1. The difference from a non-PAN kernel is
> that with the 3:1 memory split, TTBR1 always uses 3 levels of page
> tables.
>
> As part of the change we are using preprocessor elif definied() clauses
> so balance these clauses by converting relevant precedingt ifdef
> clauses to if defined() clauses.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Thanks for your patch, which is now commit 7af5b901e84743c6 ("ARM:
9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement")
in arm/for-next (next-20240502 and later).

On Koelsch (R-Car M2-W with dual Cortex A15) with LPAE enabled:

    Run /sbin/init as init process
      with arguments:
        /sbin/init
      with environment:
        HOME=/
        TERM=linux
    Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
    CPU: 1 PID: 1 Comm: init Tainted: G        W        N
6.9.0-rc1-koelsch-00004-g7af5b901e847 #1930
    Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
    Call trace:
     unwind_backtrace from show_stack+0x10/0x14
     show_stack from dump_stack_lvl+0x78/0xa8
     dump_stack_lvl from panic+0x118/0x398
     panic from do_exit+0x1ec/0x938
     do_exit from sys_exit_group+0x0/0x10
    ---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00000004 ]---

Disabling LPAE fixes the issue.

Note that shmobile_defconfig has CONFIG_LPAE=n, and thus works fine.

Gr{oetje,eeting}s,

                        Geert
Linus Walleij May 13, 2024, 7:23 p.m. UTC | #2
Hi Geert,

On Tue, May 7, 2024 at 3:10 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> Thanks for your patch, which is now commit 7af5b901e84743c6 ("ARM:
> 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement")
> in arm/for-next (next-20240502 and later).
>
> On Koelsch (R-Car M2-W with dual Cortex A15) with LPAE enabled:
>
>     Run /sbin/init as init process
>       with arguments:
>         /sbin/init
>       with environment:
>         HOME=/
>         TERM=linux
>     Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
>     CPU: 1 PID: 1 Comm: init Tainted: G        W        N
> 6.9.0-rc1-koelsch-00004-g7af5b901e847 #1930
>     Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
>     Call trace:
>      unwind_backtrace from show_stack+0x10/0x14
>      show_stack from dump_stack_lvl+0x78/0xa8
>      dump_stack_lvl from panic+0x118/0x398
>      panic from do_exit+0x1ec/0x938
>      do_exit from sys_exit_group+0x0/0x10
>     ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x00000004 ]---
>
> Disabling LPAE fixes the issue.

How annoying. I guess it doesn't help you that it works like a charm on
Versatile Express in QEMU, and also actually tested on the real
hardware. (Dual Cortex-A9).

So init is not executing, which userspace is this? I was  just testing
with busybox so far, maybe I need to test on something
bigger?

Do you have your ARMv7 file system available in an image or so
I can test it on Versatile Express?

Yours,
Linus Walleij
Geert Uytterhoeven May 13, 2024, 7:58 p.m. UTC | #3
Hi Linus,

On Mon, May 13, 2024 at 9:24 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> On Tue, May 7, 2024 at 3:10 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > Thanks for your patch, which is now commit 7af5b901e84743c6 ("ARM:
> > 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement")
> > in arm/for-next (next-20240502 and later).
> >
> > On Koelsch (R-Car M2-W with dual Cortex A15) with LPAE enabled:
> >
> >     Run /sbin/init as init process
> >       with arguments:
> >         /sbin/init
> >       with environment:
> >         HOME=/
> >         TERM=linux
> >     Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
> >     CPU: 1 PID: 1 Comm: init Tainted: G        W        N
> > 6.9.0-rc1-koelsch-00004-g7af5b901e847 #1930
> >     Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
> >     Call trace:
> >      unwind_backtrace from show_stack+0x10/0x14
> >      show_stack from dump_stack_lvl+0x78/0xa8
> >      dump_stack_lvl from panic+0x118/0x398
> >      panic from do_exit+0x1ec/0x938
> >      do_exit from sys_exit_group+0x0/0x10
> >     ---[ end Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x00000004 ]---
> >
> > Disabling LPAE fixes the issue.
>
> How annoying. I guess it doesn't help you that it works like a charm on
> Versatile Express in QEMU, and also actually tested on the real
> hardware. (Dual Cortex-A9).

Interesting. AFAIK Cortex-A9 does not support LPAE?

> So init is not executing, which userspace is this? I was  just testing
> with busybox so far, maybe I need to test on something
> bigger?
>
> Do you have your ARMv7 file system available in an image or so
> I can test it on Versatile Express?

It's just a Debian nfsroot.

Gr{oetje,eeting}s,

                        Geert
Linus Walleij May 13, 2024, 8:29 p.m. UTC | #4
On Mon, May 13, 2024 at 9:58 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Mon, May 13, 2024 at 9:24 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> > On Tue, May 7, 2024 at 3:10 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > Thanks for your patch, which is now commit 7af5b901e84743c6 ("ARM:
> > > 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement")
> > > in arm/for-next (next-20240502 and later).
> > >
> > > On Koelsch (R-Car M2-W with dual Cortex A15) with LPAE enabled:
> > >
> > >     Run /sbin/init as init process
> > >       with arguments:
> > >         /sbin/init
> > >       with environment:
> > >         HOME=/
> > >         TERM=linux
> > >     Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
> > >     CPU: 1 PID: 1 Comm: init Tainted: G        W        N
> > > 6.9.0-rc1-koelsch-00004-g7af5b901e847 #1930
> > >     Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
> > >     Call trace:
> > >      unwind_backtrace from show_stack+0x10/0x14
> > >      show_stack from dump_stack_lvl+0x78/0xa8
> > >      dump_stack_lvl from panic+0x118/0x398
> > >      panic from do_exit+0x1ec/0x938
> > >      do_exit from sys_exit_group+0x0/0x10
> > >     ---[ end Kernel panic - not syncing: Attempted to kill init!
> > > exitcode=0x00000004 ]---
> > >
> > > Disabling LPAE fixes the issue.
> >
> > How annoying. I guess it doesn't help you that it works like a charm on
> > Versatile Express in QEMU, and also actually tested on the real
> > hardware. (Dual Cortex-A9).
>
> Interesting. AFAIK Cortex-A9 does not support LPAE?

Allright I was rambling, what I used (albeit early on) was
STMicro STM32MP157 which has Cortex-A7.

> > So init is not executing, which userspace is this? I was  just testing
> > with busybox so far, maybe I need to test on something
> > bigger?
> >
> > Do you have your ARMv7 file system available in an image or so
> > I can test it on Versatile Express?
>
> It's just a Debian nfsroot.

OK I tried with a vanilla ArchLinuxARM rootfs which uses systemd
and all that hoopla and it boots just fine.

So I'm a bit lost here.

I guess I should bring out the STM32MP157 board again and retest
to verify that that one hardware works with this. Any other ideas?

Yours,
Linus Walleij
Florian Fainelli May 14, 2024, 3:56 a.m. UTC | #5
Hi Geert, Linus,

On 5/13/24 13:29, Linus Walleij wrote:
> On Mon, May 13, 2024 at 9:58 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> On Mon, May 13, 2024 at 9:24 PM Linus Walleij <linus.walleij@linaro.org> wrote:
>>> On Tue, May 7, 2024 at 3:10 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>>> Thanks for your patch, which is now commit 7af5b901e84743c6 ("ARM:
>>>> 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement")
>>>> in arm/for-next (next-20240502 and later).
>>>>
>>>> On Koelsch (R-Car M2-W with dual Cortex A15) with LPAE enabled:
>>>>
>>>>      Run /sbin/init as init process
>>>>        with arguments:
>>>>          /sbin/init
>>>>        with environment:
>>>>          HOME=/
>>>>          TERM=linux
>>>>      Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
>>>>      CPU: 1 PID: 1 Comm: init Tainted: G        W        N
>>>> 6.9.0-rc1-koelsch-00004-g7af5b901e847 #1930
>>>>      Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
>>>>      Call trace:
>>>>       unwind_backtrace from show_stack+0x10/0x14
>>>>       show_stack from dump_stack_lvl+0x78/0xa8
>>>>       dump_stack_lvl from panic+0x118/0x398
>>>>       panic from do_exit+0x1ec/0x938
>>>>       do_exit from sys_exit_group+0x0/0x10
>>>>      ---[ end Kernel panic - not syncing: Attempted to kill init!
>>>> exitcode=0x00000004 ]---
>>>>
>>>> Disabling LPAE fixes the issue.
>>>
>>> How annoying. I guess it doesn't help you that it works like a charm on
>>> Versatile Express in QEMU, and also actually tested on the real
>>> hardware. (Dual Cortex-A9).
>>
>> Interesting. AFAIK Cortex-A9 does not support LPAE?
> 
> Allright I was rambling, what I used (albeit early on) was
> STMicro STM32MP157 which has Cortex-A7.
> 
>>> So init is not executing, which userspace is this? I was  just testing
>>> with busybox so far, maybe I need to test on something
>>> bigger?
>>>
>>> Do you have your ARMv7 file system available in an image or so
>>> I can test it on Versatile Express?
>>
>> It's just a Debian nfsroot.
> 
> OK I tried with a vanilla ArchLinuxARM rootfs which uses systemd
> and all that hoopla and it boots just fine.
> 
> So I'm a bit lost here.
> 
> I guess I should bring out the STM32MP157 board again and retest
> to verify that that one hardware works with this. Any other ideas?

FWIW, my ARCH_BRCMSTB systems using Brahma-B15, Brahma-B53 and 
Cortex-A72 CPUs are all happily booting the kernel to users-space with 
linux-next/master and CONFIG_CPU_TTBR0_PAN=y.

However I am not able to get my Raspberry Pi 4B to boot 
linux-next/master with CONFIG_CPU_TTBR0_PAN=y, too.

Error is similar to Geert, see below.

[   11.299106] Freeing unused kernel image (initmem) memory: 79872K
[   11.305720] Run /init as init process
[   11.314070] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x00000004
[   11.321888] CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-next-20240513 #32
[   11.328709] Hardware name: BCM2711
[   11.332169] Call trace:
[   11.332179]  unwind_backtrace from show_stack+0x10/0x14
[   11.340087]  show_stack from panic+0x20c/0x55c
[   11.344615]  panic from do_exit+0x6b0/0x1e74
[   11.348972]  do_exit from do_group_exit+0xcc/0x280
[   11.353857]  do_group_exit from get_signal+0xfb4/0x1340
[   11.359182]  get_signal from do_work_pending+0x2c0/0x7bc
[   11.364590]  do_work_pending from slow_work_pending+0xc/0x24
[   11.370351] Exception stack(0xf082bfb0 to 0xf082bff8)
[   11.375492] bfa0:                                     b6bca568 
00000000 003fa0d6 aedf3d20
[   11.383811] bfc0: aedf4a28 b6bca6f8 b6bca73c b6bca710 b6bca748 
b6bca6f8 aedf4a28 b6bca6f8
[   11.392127] bfe0: b6bca590 b6bca548 aeddae15 aedeb660 200f0030 ffffffff
[   11.398954] ---[ end Kernel panic - not syncing: Attempted to kill 
init! exitcode=0x00000004 ]---

I am in the process of running a diff between configurations because 
nothing obvious jumps out as being problematic so far.
--
Florian
Geert Uytterhoeven May 14, 2024, 6:41 a.m. UTC | #6
Hi Linus,

On Mon, May 13, 2024 at 10:29 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> On Mon, May 13, 2024 at 9:58 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Mon, May 13, 2024 at 9:24 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> > > On Tue, May 7, 2024 at 3:10 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > Thanks for your patch, which is now commit 7af5b901e84743c6 ("ARM:
> > > > 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement")
> > > > in arm/for-next (next-20240502 and later).
> > > >
> > > > On Koelsch (R-Car M2-W with dual Cortex A15) with LPAE enabled:
> > > >
> > > >     Run /sbin/init as init process
> > > >       with arguments:
> > > >         /sbin/init
> > > >       with environment:
> > > >         HOME=/
> > > >         TERM=linux
> > > >     Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
> > > >     CPU: 1 PID: 1 Comm: init Tainted: G        W        N
> > > > 6.9.0-rc1-koelsch-00004-g7af5b901e847 #1930
> > > >     Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
> > > >     Call trace:
> > > >      unwind_backtrace from show_stack+0x10/0x14
> > > >      show_stack from dump_stack_lvl+0x78/0xa8
> > > >      dump_stack_lvl from panic+0x118/0x398
> > > >      panic from do_exit+0x1ec/0x938
> > > >      do_exit from sys_exit_group+0x0/0x10
> > > >     ---[ end Kernel panic - not syncing: Attempted to kill init!
> > > > exitcode=0x00000004 ]---
> > > >
> > > > Disabling LPAE fixes the issue.
> > >
> > > How annoying. I guess it doesn't help you that it works like a charm on
> > > Versatile Express in QEMU, and also actually tested on the real
> > > hardware. (Dual Cortex-A9).
> >
> > Interesting. AFAIK Cortex-A9 does not support LPAE?
>
> Allright I was rambling, what I used (albeit early on) was
> STMicro STM32MP157 which has Cortex-A7.

I can reproduce the issue on Cortex-A7 (R-Car E2), too.

CONFIG_ARM_LPAE=y
CONFIG_CPU_TTBR0_PAN=y

> > > So init is not executing, which userspace is this? I was  just testing
> > > with busybox so far, maybe I need to test on something
> > > bigger?
> > >
> > > Do you have your ARMv7 file system available in an image or so
> > > I can test it on Versatile Express?
> >
> > It's just a Debian nfsroot.
>
> OK I tried with a vanilla ArchLinuxARM rootfs which uses systemd
> and all that hoopla and it boots just fine.
>
> So I'm a bit lost here.

I sent you a small initramfs by PM.

Gr{oetje,eeting}s,

                        Geert
Linus Walleij May 14, 2024, 7:37 a.m. UTC | #7
On Mon, May 13, 2024 at 10:29 PM Linus Walleij <linus.walleij@linaro.org> wrote:

> I guess I should bring out the STM32MP157 board again and retest
> to verify that that one hardware works with this. Any other ideas?

I got this Cortex-A7 board out and booted with LPAE and PAN
using TTBR0 and it boots fine (some kind of OpenEmbedded/YOCTO
system is in it with systemd and all.

Tested with LKDTM provoked crashes and it behaves as expected.

Given Florians report it clearly works on some LPAE:s and not on some
others for some reason! I suppose we need to figure it out.

Catalin do you have some ideas?

Yours,
Linus Walleij
Linus Walleij May 14, 2024, 7:46 a.m. UTC | #8
On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> I sent you a small initramfs by PM.

Booted this just fine on Vexpress QEMU:

Run /init as init process
sysctl: error: 'kernel.hotplug' is an unknown key


boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
/ # mount -t debugfs none /sys/kernel/debug
/ # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
lkdtm: Performing direct entry ACCESS_USERSPACE
lkdtm: attempting bad read at 76fea000
8<--- cut here ---
Unable to handle kernel paging request at virtual address 76fea000 when read
[76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
Internal error: Oops: 206 [#1] SMP ARM
CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
Hardware name: ARM-Versatile Express
PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138

I'm starting to think it is something about different LPAE implementations here.

Yours,
Linus Walleij
Ard Biesheuvel May 14, 2024, 7:59 a.m. UTC | #9
On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> > I sent you a small initramfs by PM.
>
> Booted this just fine on Vexpress QEMU:
>
> Run /init as init process
> sysctl: error: 'kernel.hotplug' is an unknown key
>
>
> boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
> / # mount -t debugfs none /sys/kernel/debug
> / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
> lkdtm: Performing direct entry ACCESS_USERSPACE
> lkdtm: attempting bad read at 76fea000
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address 76fea000 when read
> [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
> Internal error: Oops: 206 [#1] SMP ARM
> CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
> Hardware name: ARM-Versatile Express
> PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
>
> I'm starting to think it is something about different LPAE implementations here.
>

I have built multi_v7_defconfig with the following enabled

CONFIG_ARM_LPAE=y
CONFIG_CPU_TTBR0_PAN=y
CONFIG_LKDTM=y

and the resulting kernel boots happily as a 32-bit VM running under a
Rpi4 KVM host.

Could someone post an actual .config that reproduces this? Rpi4 is
A72, which both works and doesn't work in Florian's testing, so I'd be
highly surprised if this is not a config issue.
Geert Uytterhoeven May 14, 2024, 8:04 a.m. UTC | #10
Hi Ard,

On Tue, May 14, 2024 at 9:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
> > On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > > I sent you a small initramfs by PM.
> >
> > Booted this just fine on Vexpress QEMU:
> >
> > Run /init as init process
> > sysctl: error: 'kernel.hotplug' is an unknown key
> >
> >
> > boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
> > / # mount -t debugfs none /sys/kernel/debug
> > / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
> > lkdtm: Performing direct entry ACCESS_USERSPACE
> > lkdtm: attempting bad read at 76fea000
> > 8<--- cut here ---
> > Unable to handle kernel paging request at virtual address 76fea000 when read
> > [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
> > Internal error: Oops: 206 [#1] SMP ARM
> > CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
> > Hardware name: ARM-Versatile Express
> > PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> >
> > I'm starting to think it is something about different LPAE implementations here.
> >
>
> I have built multi_v7_defconfig with the following enabled
>
> CONFIG_ARM_LPAE=y
> CONFIG_CPU_TTBR0_PAN=y
> CONFIG_LKDTM=y
>
> and the resulting kernel boots happily as a 32-bit VM running under a
> Rpi4 KVM host.
>
> Could someone post an actual .config that reproduces this? Rpi4 is
> A72, which both works and doesn't work in Florian's testing, so I'd be
> highly surprised if this is not a config issue.

shmobile_defconfig with CONFIG_LPAE=y added failed for me before.

Building multi_v7_defconfig with the above enabled...

Gr{oetje,eeting}s,

                        Geert
Russell King (Oracle) May 14, 2024, 8:14 a.m. UTC | #11
On Mon, May 13, 2024 at 08:56:20PM -0700, Florian Fainelli wrote:
> Hi Geert, Linus,
> 
> Error is similar to Geert, see below.
> 
> [   11.299106] Freeing unused kernel image (initmem) memory: 79872K
> [   11.305720] Run /init as init process
> [   11.314070] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x00000004
> [   11.321888] CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-next-20240513 #32
> [   11.328709] Hardware name: BCM2711
> [   11.332169] Call trace:
> [   11.332179]  unwind_backtrace from show_stack+0x10/0x14
> [   11.340087]  show_stack from panic+0x20c/0x55c
> [   11.344615]  panic from do_exit+0x6b0/0x1e74
> [   11.348972]  do_exit from do_group_exit+0xcc/0x280
> [   11.353857]  do_group_exit from get_signal+0xfb4/0x1340
> [   11.359182]  get_signal from do_work_pending+0x2c0/0x7bc
> [   11.364590]  do_work_pending from slow_work_pending+0xc/0x24
> [   11.370351] Exception stack(0xf082bfb0 to 0xf082bff8)
> [   11.375492] bfa0:                                     b6bca568 00000000
> 003fa0d6 aedf3d20
> [   11.383811] bfc0: aedf4a28 b6bca6f8 b6bca73c b6bca710 b6bca748 b6bca6f8
> aedf4a28 b6bca6f8
> [   11.392127] bfe0: b6bca590 b6bca548 aeddae15 aedeb660 200f0030 ffffffff
> [   11.398954] ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x00000004 ]---

You could enable CONFiG_DEBUG_USER, and then pass "user_debug=24" to
the kernel to get a report for the conditions that lead to SEGV/BUS
signals being delivered to a userspace processd.
Ard Biesheuvel May 14, 2024, 8:25 a.m. UTC | #12
On Tue, 14 May 2024 at 10:04, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ard,
>
> On Tue, May 14, 2024 at 9:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
> > > On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > >
> > > > I sent you a small initramfs by PM.
> > >
> > > Booted this just fine on Vexpress QEMU:
> > >
> > > Run /init as init process
> > > sysctl: error: 'kernel.hotplug' is an unknown key
> > >
> > >
> > > boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
> > > / # mount -t debugfs none /sys/kernel/debug
> > > / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
> > > lkdtm: Performing direct entry ACCESS_USERSPACE
> > > lkdtm: attempting bad read at 76fea000
> > > 8<--- cut here ---
> > > Unable to handle kernel paging request at virtual address 76fea000 when read
> > > [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
> > > Internal error: Oops: 206 [#1] SMP ARM
> > > CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
> > > Hardware name: ARM-Versatile Express
> > > PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > >
> > > I'm starting to think it is something about different LPAE implementations here.
> > >
> >
> > I have built multi_v7_defconfig with the following enabled
> >
> > CONFIG_ARM_LPAE=y
> > CONFIG_CPU_TTBR0_PAN=y
> > CONFIG_LKDTM=y
> >
> > and the resulting kernel boots happily as a 32-bit VM running under a
> > Rpi4 KVM host.
> >
> > Could someone post an actual .config that reproduces this? Rpi4 is
> > A72, which both works and doesn't work in Florian's testing, so I'd be
> > highly surprised if this is not a config issue.
>
> shmobile_defconfig with CONFIG_LPAE=y added failed for me before.
>
> Building multi_v7_defconfig with the above enabled...
>

What are you passing as the command line?
Russell King (Oracle) May 14, 2024, 9:22 a.m. UTC | #13
On Tue, May 14, 2024 at 10:25:28AM +0200, Ard Biesheuvel wrote:
> On Tue, 14 May 2024 at 10:04, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Ard,
> >
> > On Tue, May 14, 2024 at 9:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
> > > > On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > >
> > > > > I sent you a small initramfs by PM.
> > > >
> > > > Booted this just fine on Vexpress QEMU:
> > > >
> > > > Run /init as init process
> > > > sysctl: error: 'kernel.hotplug' is an unknown key
> > > >
> > > >
> > > > boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
> > > > / # mount -t debugfs none /sys/kernel/debug
> > > > / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
> > > > lkdtm: Performing direct entry ACCESS_USERSPACE
> > > > lkdtm: attempting bad read at 76fea000
> > > > 8<--- cut here ---
> > > > Unable to handle kernel paging request at virtual address 76fea000 when read
> > > > [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
> > > > Internal error: Oops: 206 [#1] SMP ARM
> > > > CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
> > > > Hardware name: ARM-Versatile Express
> > > > PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > > LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > >
> > > > I'm starting to think it is something about different LPAE implementations here.
> > > >
> > >
> > > I have built multi_v7_defconfig with the following enabled
> > >
> > > CONFIG_ARM_LPAE=y
> > > CONFIG_CPU_TTBR0_PAN=y
> > > CONFIG_LKDTM=y
> > >
> > > and the resulting kernel boots happily as a 32-bit VM running under a
> > > Rpi4 KVM host.
> > >
> > > Could someone post an actual .config that reproduces this? Rpi4 is
> > > A72, which both works and doesn't work in Florian's testing, so I'd be
> > > highly surprised if this is not a config issue.
> >
> > shmobile_defconfig with CONFIG_LPAE=y added failed for me before.
> >
> > Building multi_v7_defconfig with the above enabled...
> >
> 
> What are you passing as the command line?

A more relevant question at this point in time - what do people want me
to do with these patches given that we're now in the merge window?
Geert Uytterhoeven May 14, 2024, 11:22 a.m. UTC | #14
Hi Russell,

On Tue, May 14, 2024 at 10:15 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> On Mon, May 13, 2024 at 08:56:20PM -0700, Florian Fainelli wrote:
> > [   11.299106] Freeing unused kernel image (initmem) memory: 79872K
> > [   11.305720] Run /init as init process
> > [   11.314070] Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x00000004
> > [   11.321888] CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-next-20240513 #32
> > [   11.328709] Hardware name: BCM2711
> > [   11.332169] Call trace:
> > [   11.332179]  unwind_backtrace from show_stack+0x10/0x14
> > [   11.340087]  show_stack from panic+0x20c/0x55c
> > [   11.344615]  panic from do_exit+0x6b0/0x1e74
> > [   11.348972]  do_exit from do_group_exit+0xcc/0x280
> > [   11.353857]  do_group_exit from get_signal+0xfb4/0x1340
> > [   11.359182]  get_signal from do_work_pending+0x2c0/0x7bc
> > [   11.364590]  do_work_pending from slow_work_pending+0xc/0x24
> > [   11.370351] Exception stack(0xf082bfb0 to 0xf082bff8)
> > [   11.375492] bfa0:                                     b6bca568 00000000
> > 003fa0d6 aedf3d20
> > [   11.383811] bfc0: aedf4a28 b6bca6f8 b6bca73c b6bca710 b6bca748 b6bca6f8
> > aedf4a28 b6bca6f8
> > [   11.392127] bfe0: b6bca590 b6bca548 aeddae15 aedeb660 200f0030 ffffffff
> > [   11.398954] ---[ end Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x00000004 ]---
>
> You could enable CONFiG_DEBUG_USER, and then pass "user_debug=24" to
> the kernel to get a report for the conditions that lead to SEGV/BUS
> signals being delivered to a userspace processd.

That does not seem to make any difference for me, i.e. no report?

Gr{oetje,eeting}s,

                        Geert
Geert Uytterhoeven May 14, 2024, 11:28 a.m. UTC | #15
Hi Ard,

On Tue, May 14, 2024 at 10:25 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Tue, 14 May 2024 at 10:04, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, May 14, 2024 at 9:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
> > > > On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > >
> > > > > I sent you a small initramfs by PM.
> > > >
> > > > Booted this just fine on Vexpress QEMU:
> > > >
> > > > Run /init as init process
> > > > sysctl: error: 'kernel.hotplug' is an unknown key
> > > >
> > > >
> > > > boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
> > > > / # mount -t debugfs none /sys/kernel/debug
> > > > / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
> > > > lkdtm: Performing direct entry ACCESS_USERSPACE
> > > > lkdtm: attempting bad read at 76fea000
> > > > 8<--- cut here ---
> > > > Unable to handle kernel paging request at virtual address 76fea000 when read
> > > > [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
> > > > Internal error: Oops: 206 [#1] SMP ARM
> > > > CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
> > > > Hardware name: ARM-Versatile Express
> > > > PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > > LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > >
> > > > I'm starting to think it is something about different LPAE implementations here.
> > >
> > > I have built multi_v7_defconfig with the following enabled
> > >
> > > CONFIG_ARM_LPAE=y
> > > CONFIG_CPU_TTBR0_PAN=y
> > > CONFIG_LKDTM=y
> > >
> > > and the resulting kernel boots happily as a 32-bit VM running under a
> > > Rpi4 KVM host.
> > >
> > > Could someone post an actual .config that reproduces this? Rpi4 is
> > > A72, which both works and doesn't work in Florian's testing, so I'd be
> > > highly surprised if this is not a config issue.
> >
> > shmobile_defconfig with CONFIG_LPAE=y added failed for me before.
> >
> > Building multi_v7_defconfig with the above enabled...

And that works, while shmobile_defconfig with the above does not!
So it is config-related.

FTR, plain multi_v7_defconfig also works.

> What are you passing as the command line?

ignore_loglevel ip=dhcp root=/dev/nfs rw nfsroot=<ip>:<path>/debian-armhf,tcp,v3

Gr{oetje,eeting}s,

                        Geert
Russell King (Oracle) May 14, 2024, 11:33 a.m. UTC | #16
On Tue, May 14, 2024 at 01:22:36PM +0200, Geert Uytterhoeven wrote:
> Hi Russell,
> 
> On Tue, May 14, 2024 at 10:15 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> > On Mon, May 13, 2024 at 08:56:20PM -0700, Florian Fainelli wrote:
> > > [   11.299106] Freeing unused kernel image (initmem) memory: 79872K
> > > [   11.305720] Run /init as init process
> > > [   11.314070] Kernel panic - not syncing: Attempted to kill init!
> > > exitcode=0x00000004
> > > [   11.321888] CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-next-20240513 #32
> > > [   11.328709] Hardware name: BCM2711
> > > [   11.332169] Call trace:
> > > [   11.332179]  unwind_backtrace from show_stack+0x10/0x14
> > > [   11.340087]  show_stack from panic+0x20c/0x55c
> > > [   11.344615]  panic from do_exit+0x6b0/0x1e74
> > > [   11.348972]  do_exit from do_group_exit+0xcc/0x280
> > > [   11.353857]  do_group_exit from get_signal+0xfb4/0x1340
> > > [   11.359182]  get_signal from do_work_pending+0x2c0/0x7bc
> > > [   11.364590]  do_work_pending from slow_work_pending+0xc/0x24
> > > [   11.370351] Exception stack(0xf082bfb0 to 0xf082bff8)
> > > [   11.375492] bfa0:                                     b6bca568 00000000
> > > 003fa0d6 aedf3d20
> > > [   11.383811] bfc0: aedf4a28 b6bca6f8 b6bca73c b6bca710 b6bca748 b6bca6f8
> > > aedf4a28 b6bca6f8
> > > [   11.392127] bfe0: b6bca590 b6bca548 aeddae15 aedeb660 200f0030 ffffffff
> > > [   11.398954] ---[ end Kernel panic - not syncing: Attempted to kill init!
> > > exitcode=0x00000004 ]---
> >
> > You could enable CONFiG_DEBUG_USER, and then pass "user_debug=24" to
> > the kernel to get a report for the conditions that lead to SEGV/BUS
> > signals being delivered to a userspace processd.
> 
> That does not seem to make any difference for me, i.e. no report?

Then it's not a SEGV/BUS (iow page fault.) Please try user_debug=31
in that case. Thanks.
Linus Walleij May 14, 2024, 11:40 a.m. UTC | #17
On Tue, May 14, 2024 at 11:22 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:

> A more relevant question at this point in time - what do people want me
> to do with these patches given that we're now in the merge window?

Let's see if it is as simple as it seems (such as a missing Kconfig select
or depends on).

If we can't figure it out let's revert the last patch in <timespan> where
it's best of you decide on the <timespan> I'd say, you will know when
the pull request needs to go off.

Yours,
Linus Walleij
Geert Uytterhoeven May 14, 2024, 12:32 p.m. UTC | #18
Hi Russell,

On Tue, May 14, 2024 at 1:33 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> On Tue, May 14, 2024 at 01:22:36PM +0200, Geert Uytterhoeven wrote:
> > On Tue, May 14, 2024 at 10:15 AM Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > > On Mon, May 13, 2024 at 08:56:20PM -0700, Florian Fainelli wrote:
> > > > [   11.299106] Freeing unused kernel image (initmem) memory: 79872K
> > > > [   11.305720] Run /init as init process
> > > > [   11.314070] Kernel panic - not syncing: Attempted to kill init!
> > > > exitcode=0x00000004
> > > > [   11.321888] CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-next-20240513 #32
> > > > [   11.328709] Hardware name: BCM2711
> > > > [   11.332169] Call trace:
> > > > [   11.332179]  unwind_backtrace from show_stack+0x10/0x14
> > > > [   11.340087]  show_stack from panic+0x20c/0x55c
> > > > [   11.344615]  panic from do_exit+0x6b0/0x1e74
> > > > [   11.348972]  do_exit from do_group_exit+0xcc/0x280
> > > > [   11.353857]  do_group_exit from get_signal+0xfb4/0x1340
> > > > [   11.359182]  get_signal from do_work_pending+0x2c0/0x7bc
> > > > [   11.364590]  do_work_pending from slow_work_pending+0xc/0x24
> > > > [   11.370351] Exception stack(0xf082bfb0 to 0xf082bff8)
> > > > [   11.375492] bfa0:                                     b6bca568 00000000
> > > > 003fa0d6 aedf3d20
> > > > [   11.383811] bfc0: aedf4a28 b6bca6f8 b6bca73c b6bca710 b6bca748 b6bca6f8
> > > > aedf4a28 b6bca6f8
> > > > [   11.392127] bfe0: b6bca590 b6bca548 aeddae15 aedeb660 200f0030 ffffffff
> > > > [   11.398954] ---[ end Kernel panic - not syncing: Attempted to kill init!
> > > > exitcode=0x00000004 ]---
> > >
> > > You could enable CONFiG_DEBUG_USER, and then pass "user_debug=24" to
> > > the kernel to get a report for the conditions that lead to SEGV/BUS
> > > signals being delivered to a userspace processd.
> >
> > That does not seem to make any difference for me, i.e. no report?
>
> Then it's not a SEGV/BUS (iow page fault.) Please try user_debug=31
> in that case. Thanks.

Thanks, much better:

    init (1): undefined instruction: pc=b6f4feda
    CPU: 1 PID: 1 Comm: init Not tainted
6.9.0-shmobile-09158-g1218ffc3659e #1820
    Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
    PC is at 0xb6f4feda
    LR is at 0xb6f4ed31
    pc : [<b6f4feda>]    lr : [<b6f4ed31>]    psr: 60000030
    sp : be970630  ip : be970678  fp : b6f67978
    r10: 00000000  r9 : 004d48ff  r8 : be970844
    r7 : be9707f8  r6 : b6f67978  r5 : be970850  r4 : be970844
    r3 : b6f669b0  r2 : 003fb0d6  r1 : 00000000  r0 : be970650
    Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
    Control: 30c5387d  Table: 41f6cac0  DAC: 55555555
    Code: bad PC value

Gr{oetje,eeting}s,

                        Geert
Russell King (Oracle) May 14, 2024, 12:38 p.m. UTC | #19
On Tue, May 14, 2024 at 02:32:23PM +0200, Geert Uytterhoeven wrote:
> Hi Russell,
> 
> On Tue, May 14, 2024 at 1:33 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> > On Tue, May 14, 2024 at 01:22:36PM +0200, Geert Uytterhoeven wrote:
> > > On Tue, May 14, 2024 at 10:15 AM Russell King (Oracle)
> > > <linux@armlinux.org.uk> wrote:
> > > > On Mon, May 13, 2024 at 08:56:20PM -0700, Florian Fainelli wrote:
> > > > > [   11.299106] Freeing unused kernel image (initmem) memory: 79872K
> > > > > [   11.305720] Run /init as init process
> > > > > [   11.314070] Kernel panic - not syncing: Attempted to kill init!
> > > > > exitcode=0x00000004
> > > > > [   11.321888] CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-next-20240513 #32
> > > > > [   11.328709] Hardware name: BCM2711
> > > > > [   11.332169] Call trace:
> > > > > [   11.332179]  unwind_backtrace from show_stack+0x10/0x14
> > > > > [   11.340087]  show_stack from panic+0x20c/0x55c
> > > > > [   11.344615]  panic from do_exit+0x6b0/0x1e74
> > > > > [   11.348972]  do_exit from do_group_exit+0xcc/0x280
> > > > > [   11.353857]  do_group_exit from get_signal+0xfb4/0x1340
> > > > > [   11.359182]  get_signal from do_work_pending+0x2c0/0x7bc
> > > > > [   11.364590]  do_work_pending from slow_work_pending+0xc/0x24
> > > > > [   11.370351] Exception stack(0xf082bfb0 to 0xf082bff8)
> > > > > [   11.375492] bfa0:                                     b6bca568 00000000
> > > > > 003fa0d6 aedf3d20
> > > > > [   11.383811] bfc0: aedf4a28 b6bca6f8 b6bca73c b6bca710 b6bca748 b6bca6f8
> > > > > aedf4a28 b6bca6f8
> > > > > [   11.392127] bfe0: b6bca590 b6bca548 aeddae15 aedeb660 200f0030 ffffffff
> > > > > [   11.398954] ---[ end Kernel panic - not syncing: Attempted to kill init!
> > > > > exitcode=0x00000004 ]---
> > > >
> > > > You could enable CONFiG_DEBUG_USER, and then pass "user_debug=24" to
> > > > the kernel to get a report for the conditions that lead to SEGV/BUS
> > > > signals being delivered to a userspace processd.
> > >
> > > That does not seem to make any difference for me, i.e. no report?
> >
> > Then it's not a SEGV/BUS (iow page fault.) Please try user_debug=31
> > in that case. Thanks.
> 
> Thanks, much better:
> 
>     init (1): undefined instruction: pc=b6f4feda
>     CPU: 1 PID: 1 Comm: init Not tainted
> 6.9.0-shmobile-09158-g1218ffc3659e #1820
>     Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
>     PC is at 0xb6f4feda
>     LR is at 0xb6f4ed31
>     pc : [<b6f4feda>]    lr : [<b6f4ed31>]    psr: 60000030
>     sp : be970630  ip : be970678  fp : b6f67978
>     r10: 00000000  r9 : 004d48ff  r8 : be970844
>     r7 : be9707f8  r6 : b6f67978  r5 : be970850  r4 : be970844
>     r3 : b6f669b0  r2 : 003fb0d6  r1 : 00000000  r0 : be970650
>     Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
>     Control: 30c5387d  Table: 41f6cac0  DAC: 55555555
>     Code: bad PC value

Well, that points to another issue... get_user() appears to be unable
to access userspace. Userspace can, however, as we wouldn't get an
undefined instruction abort unless it can successfully access the
address.

This points to something being very wrong with this implementation.
Catalin Marinas May 14, 2024, 2:39 p.m. UTC | #20
On Tue, May 14, 2024 at 09:37:21AM +0200, Linus Walleij wrote:
> On Mon, May 13, 2024 at 10:29 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> > I guess I should bring out the STM32MP157 board again and retest
> > to verify that that one hardware works with this. Any other ideas?
> 
> I got this Cortex-A7 board out and booted with LPAE and PAN
> using TTBR0 and it boots fine (some kind of OpenEmbedded/YOCTO
> system is in it with systemd and all.
> 
> Tested with LKDTM provoked crashes and it behaves as expected.
> 
> Given Florians report it clearly works on some LPAE:s and not on some
> others for some reason! I suppose we need to figure it out.
> 
> Catalin do you have some ideas?

Since EPD0 is allowed to be cached in the TLB, we need to change the
ASID as well (assuming that it's at least tagged by ASID). But, to keep
the uaccess enable/disable code simple, patch 4 short-circuits this by
toggling the TTBCR.A1 bit. At the time (~2015) this was fine on the
32-bit CPUs. Since the A1 bit is theoretically allowed to be cached in
the TLB, toggling it may not have any effect without a TLB invalidation.
I recall some old documentation stating that the EPD0 value is only
cached when 0, not 1 (IOW, toggling it to 1 would not prevent existing
TLB entries from being hit). This wouldn't have been a significant
issue, most likely the PAN protection not always working but here it's
the other way around, it looks like a value of 1 persisting even after
being toggled.

I'd say the weird behaviour is caused by different microarchitectures,
especially if you run this on ARMv8 hardware (the original patch was
never intended to).

It would be worth trying to do a full TLBI invalidation after uaccess
enable/disable just to check this theory.
Catalin Marinas May 14, 2024, 3:03 p.m. UTC | #21
On Tue, May 14, 2024 at 01:38:07PM +0100, Russell King wrote:
> On Tue, May 14, 2024 at 02:32:23PM +0200, Geert Uytterhoeven wrote:
> > On Tue, May 14, 2024 at 1:33 PM Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > > On Tue, May 14, 2024 at 01:22:36PM +0200, Geert Uytterhoeven wrote:
> > > > On Tue, May 14, 2024 at 10:15 AM Russell King (Oracle)
> > > > <linux@armlinux.org.uk> wrote:
> > > > > On Mon, May 13, 2024 at 08:56:20PM -0700, Florian Fainelli wrote:
> > > > > > [   11.299106] Freeing unused kernel image (initmem) memory: 79872K
> > > > > > [   11.305720] Run /init as init process
> > > > > > [   11.314070] Kernel panic - not syncing: Attempted to kill init!
> > > > > > exitcode=0x00000004
> > > > > > [   11.321888] CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-next-20240513 #32
> > > > > > [   11.328709] Hardware name: BCM2711
> > > > > > [   11.332169] Call trace:
> > > > > > [   11.332179]  unwind_backtrace from show_stack+0x10/0x14
> > > > > > [   11.340087]  show_stack from panic+0x20c/0x55c
> > > > > > [   11.344615]  panic from do_exit+0x6b0/0x1e74
> > > > > > [   11.348972]  do_exit from do_group_exit+0xcc/0x280
> > > > > > [   11.353857]  do_group_exit from get_signal+0xfb4/0x1340
> > > > > > [   11.359182]  get_signal from do_work_pending+0x2c0/0x7bc
> > > > > > [   11.364590]  do_work_pending from slow_work_pending+0xc/0x24
> > > > > > [   11.370351] Exception stack(0xf082bfb0 to 0xf082bff8)
> > > > > > [   11.375492] bfa0:                                     b6bca568 00000000
> > > > > > 003fa0d6 aedf3d20
> > > > > > [   11.383811] bfc0: aedf4a28 b6bca6f8 b6bca73c b6bca710 b6bca748 b6bca6f8
> > > > > > aedf4a28 b6bca6f8
> > > > > > [   11.392127] bfe0: b6bca590 b6bca548 aeddae15 aedeb660 200f0030 ffffffff
> > > > > > [   11.398954] ---[ end Kernel panic - not syncing: Attempted to kill init!
> > > > > > exitcode=0x00000004 ]---
> > > > >
> > > > > You could enable CONFiG_DEBUG_USER, and then pass "user_debug=24" to
> > > > > the kernel to get a report for the conditions that lead to SEGV/BUS
> > > > > signals being delivered to a userspace processd.
> > > >
> > > > That does not seem to make any difference for me, i.e. no report?
> > >
> > > Then it's not a SEGV/BUS (iow page fault.) Please try user_debug=31
> > > in that case. Thanks.
> > 
> > Thanks, much better:
> > 
> >     init (1): undefined instruction: pc=b6f4feda
> >     CPU: 1 PID: 1 Comm: init Not tainted
> > 6.9.0-shmobile-09158-g1218ffc3659e #1820
> >     Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
> >     PC is at 0xb6f4feda
> >     LR is at 0xb6f4ed31
> >     pc : [<b6f4feda>]    lr : [<b6f4ed31>]    psr: 60000030
> >     sp : be970630  ip : be970678  fp : b6f67978
> >     r10: 00000000  r9 : 004d48ff  r8 : be970844
> >     r7 : be9707f8  r6 : b6f67978  r5 : be970850  r4 : be970844
> >     r3 : b6f669b0  r2 : 003fb0d6  r1 : 00000000  r0 : be970650
> >     Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
> >     Control: 30c5387d  Table: 41f6cac0  DAC: 55555555
> >     Code: bad PC value
> 
> Well, that points to another issue... get_user() appears to be unable
> to access userspace. Userspace can, however, as we wouldn't get an
> undefined instruction abort unless it can successfully access the
> address.
> 
> This points to something being very wrong with this implementation.

Yeah, it doesn't look great. Let's see if TLBIALLIS solves anything,
though not as an upstream solution as it's expensive, just to understand
the problem a bit better. So maybe revert the last patch from the
series, the first three seem inoffensive.

For the flush_tlb_all(), I think the mcr incantation is:

	mov	r0, #0
	mcr	p15, 0, r0, c8, c7, 0

Linus, if you attempt this in the uaccess_enable/disable macros, also
force the ISB to be always on just in case the TTBRC update does not
take place before the MCR.
Geert Uytterhoeven May 14, 2024, 4:06 p.m. UTC | #22
On Tue, May 14, 2024 at 1:28 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Tue, May 14, 2024 at 10:25 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Tue, 14 May 2024 at 10:04, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Tue, May 14, 2024 at 9:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
> > > > > On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > >
> > > > > > I sent you a small initramfs by PM.
> > > > >
> > > > > Booted this just fine on Vexpress QEMU:
> > > > >
> > > > > Run /init as init process
> > > > > sysctl: error: 'kernel.hotplug' is an unknown key
> > > > >
> > > > >
> > > > > boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
> > > > > / # mount -t debugfs none /sys/kernel/debug
> > > > > / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
> > > > > lkdtm: Performing direct entry ACCESS_USERSPACE
> > > > > lkdtm: attempting bad read at 76fea000
> > > > > 8<--- cut here ---
> > > > > Unable to handle kernel paging request at virtual address 76fea000 when read
> > > > > [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
> > > > > Internal error: Oops: 206 [#1] SMP ARM
> > > > > CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
> > > > > Hardware name: ARM-Versatile Express
> > > > > PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > > > LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > > >
> > > > > I'm starting to think it is something about different LPAE implementations here.
> > > >
> > > > I have built multi_v7_defconfig with the following enabled
> > > >
> > > > CONFIG_ARM_LPAE=y
> > > > CONFIG_CPU_TTBR0_PAN=y
> > > > CONFIG_LKDTM=y
> > > >
> > > > and the resulting kernel boots happily as a 32-bit VM running under a
> > > > Rpi4 KVM host.
> > > >
> > > > Could someone post an actual .config that reproduces this? Rpi4 is
> > > > A72, which both works and doesn't work in Florian's testing, so I'd be
> > > > highly surprised if this is not a config issue.
> > >
> > > shmobile_defconfig with CONFIG_LPAE=y added failed for me before.
> > >
> > > Building multi_v7_defconfig with the above enabled...
>
> And that works, while shmobile_defconfig with the above does not!
> So it is config-related.

I ran tools/testing/ktest/config-bisect.pl and found the "offending"
config option: CONFIG_CC_OPTIMIZE_FOR_SIZE=y gives a broken
kernel, CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y works.
(verified by just enabling CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in my
 broken config)....

Gr{oetje,eeting}s,

                        Geert
Florian Fainelli May 14, 2024, 4:54 p.m. UTC | #23
On 5/14/24 09:06, Geert Uytterhoeven wrote:
> On Tue, May 14, 2024 at 1:28 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> On Tue, May 14, 2024 at 10:25 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>> On Tue, 14 May 2024 at 10:04, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>>> On Tue, May 14, 2024 at 9:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>> On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
>>>>>> On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>>>>>
>>>>>>> I sent you a small initramfs by PM.
>>>>>>
>>>>>> Booted this just fine on Vexpress QEMU:
>>>>>>
>>>>>> Run /init as init process
>>>>>> sysctl: error: 'kernel.hotplug' is an unknown key
>>>>>>
>>>>>>
>>>>>> boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
>>>>>> / # mount -t debugfs none /sys/kernel/debug
>>>>>> / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
>>>>>> lkdtm: Performing direct entry ACCESS_USERSPACE
>>>>>> lkdtm: attempting bad read at 76fea000
>>>>>> 8<--- cut here ---
>>>>>> Unable to handle kernel paging request at virtual address 76fea000 when read
>>>>>> [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
>>>>>> Internal error: Oops: 206 [#1] SMP ARM
>>>>>> CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
>>>>>> Hardware name: ARM-Versatile Express
>>>>>> PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
>>>>>> LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
>>>>>>
>>>>>> I'm starting to think it is something about different LPAE implementations here.
>>>>>
>>>>> I have built multi_v7_defconfig with the following enabled
>>>>>
>>>>> CONFIG_ARM_LPAE=y
>>>>> CONFIG_CPU_TTBR0_PAN=y
>>>>> CONFIG_LKDTM=y
>>>>>
>>>>> and the resulting kernel boots happily as a 32-bit VM running under a
>>>>> Rpi4 KVM host.
>>>>>
>>>>> Could someone post an actual .config that reproduces this? Rpi4 is
>>>>> A72, which both works and doesn't work in Florian's testing, so I'd be
>>>>> highly surprised if this is not a config issue.
>>>>
>>>> shmobile_defconfig with CONFIG_LPAE=y added failed for me before.
>>>>
>>>> Building multi_v7_defconfig with the above enabled...
>>
>> And that works, while shmobile_defconfig with the above does not!
>> So it is config-related.
> 
> I ran tools/testing/ktest/config-bisect.pl and found the "offending"
> config option: CONFIG_CC_OPTIMIZE_FOR_SIZE=y gives a broken
> kernel, CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y works.
> (verified by just enabling CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in my
>   broken config)....

I second that, my working configuration has 
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y whereas my
broken one has CONFIG_CC_OPTIMIZE_FOR_SIZE=y, sure enough, changing the 
broken configuration to have CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y turns 
it into a "working" configuration on the Raspberry Pi 4B in AArch32 mode.

That makes more sense because BCM7211 and BCM2711 are exactly the same 
CPU, thanks for restoring sanity!
Russell King (Oracle) May 14, 2024, 5:03 p.m. UTC | #24
On Tue, May 14, 2024 at 09:54:07AM -0700, Florian Fainelli wrote:
> On 5/14/24 09:06, Geert Uytterhoeven wrote:
> > On Tue, May 14, 2024 at 1:28 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Tue, May 14, 2024 at 10:25 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > On Tue, 14 May 2024 at 10:04, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > On Tue, May 14, 2024 at 9:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > > On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
> > > > > > > On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > > 
> > > > > > > > I sent you a small initramfs by PM.
> > > > > > > 
> > > > > > > Booted this just fine on Vexpress QEMU:
> > > > > > > 
> > > > > > > Run /init as init process
> > > > > > > sysctl: error: 'kernel.hotplug' is an unknown key
> > > > > > > 
> > > > > > > 
> > > > > > > boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
> > > > > > > / # mount -t debugfs none /sys/kernel/debug
> > > > > > > / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
> > > > > > > lkdtm: Performing direct entry ACCESS_USERSPACE
> > > > > > > lkdtm: attempting bad read at 76fea000
> > > > > > > 8<--- cut here ---
> > > > > > > Unable to handle kernel paging request at virtual address 76fea000 when read
> > > > > > > [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
> > > > > > > Internal error: Oops: 206 [#1] SMP ARM
> > > > > > > CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
> > > > > > > Hardware name: ARM-Versatile Express
> > > > > > > PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > > > > > LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
> > > > > > > 
> > > > > > > I'm starting to think it is something about different LPAE implementations here.
> > > > > > 
> > > > > > I have built multi_v7_defconfig with the following enabled
> > > > > > 
> > > > > > CONFIG_ARM_LPAE=y
> > > > > > CONFIG_CPU_TTBR0_PAN=y
> > > > > > CONFIG_LKDTM=y
> > > > > > 
> > > > > > and the resulting kernel boots happily as a 32-bit VM running under a
> > > > > > Rpi4 KVM host.
> > > > > > 
> > > > > > Could someone post an actual .config that reproduces this? Rpi4 is
> > > > > > A72, which both works and doesn't work in Florian's testing, so I'd be
> > > > > > highly surprised if this is not a config issue.
> > > > > 
> > > > > shmobile_defconfig with CONFIG_LPAE=y added failed for me before.
> > > > > 
> > > > > Building multi_v7_defconfig with the above enabled...
> > > 
> > > And that works, while shmobile_defconfig with the above does not!
> > > So it is config-related.
> > 
> > I ran tools/testing/ktest/config-bisect.pl and found the "offending"
> > config option: CONFIG_CC_OPTIMIZE_FOR_SIZE=y gives a broken
> > kernel, CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y works.
> > (verified by just enabling CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in my
> >   broken config)....
> 
> I second that, my working configuration has
> CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y whereas my
> broken one has CONFIG_CC_OPTIMIZE_FOR_SIZE=y, sure enough, changing the
> broken configuration to have CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y turns it
> into a "working" configuration on the Raspberry Pi 4B in AArch32 mode.
> 
> That makes more sense because BCM7211 and BCM2711 are exactly the same CPU,
> thanks for restoring sanity!

I would imagine that the problem is cpu_set_ttbcr(). Please try adding
a "memory" clobber to the asm() instruction in there.
Florian Fainelli May 14, 2024, 6:26 p.m. UTC | #25
On 5/14/24 10:03, Russell King (Oracle) wrote:
> On Tue, May 14, 2024 at 09:54:07AM -0700, Florian Fainelli wrote:
>> On 5/14/24 09:06, Geert Uytterhoeven wrote:
>>> On Tue, May 14, 2024 at 1:28 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>>> On Tue, May 14, 2024 at 10:25 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>> On Tue, 14 May 2024 at 10:04, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>>>>> On Tue, May 14, 2024 at 9:59 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>>>> On Tue, 14 May 2024 at 09:46, Linus Walleij <linus.walleij@linaro.org> wrote:
>>>>>>>> On Tue, May 14, 2024 at 8:41 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>>>>>>>
>>>>>>>>> I sent you a small initramfs by PM.
>>>>>>>>
>>>>>>>> Booted this just fine on Vexpress QEMU:
>>>>>>>>
>>>>>>>> Run /init as init process
>>>>>>>> sysctl: error: 'kernel.hotplug' is an unknown key
>>>>>>>>
>>>>>>>>
>>>>>>>> boot (Linux 6.9.0-rc1+, BusyBox v1.16.0.git, kexec-tools 2.0.1-git)
>>>>>>>> / # mount -t debugfs none /sys/kernel/debug
>>>>>>>> / # echo "ACCESS_USERSPACE" | cat >/sys/kernel/debug/provoke-crash/DIRECT
>>>>>>>> lkdtm: Performing direct entry ACCESS_USERSPACE
>>>>>>>> lkdtm: attempting bad read at 76fea000
>>>>>>>> 8<--- cut here ---
>>>>>>>> Unable to handle kernel paging request at virtual address 76fea000 when read
>>>>>>>> [76fea000] *pgd=82c93003, *pmd=82c94003, *pte=a00000811e2f5f
>>>>>>>> Internal error: Oops: 206 [#1] SMP ARM
>>>>>>>> CPU: 1 PID: 86 Comm: cat Not tainted 6.9.0-rc1+ #46
>>>>>>>> Hardware name: ARM-Versatile Express
>>>>>>>> PC is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
>>>>>>>> LR is at lkdtm_ACCESS_USERSPACE+0xc0/0x138
>>>>>>>>
>>>>>>>> I'm starting to think it is something about different LPAE implementations here.
>>>>>>>
>>>>>>> I have built multi_v7_defconfig with the following enabled
>>>>>>>
>>>>>>> CONFIG_ARM_LPAE=y
>>>>>>> CONFIG_CPU_TTBR0_PAN=y
>>>>>>> CONFIG_LKDTM=y
>>>>>>>
>>>>>>> and the resulting kernel boots happily as a 32-bit VM running under a
>>>>>>> Rpi4 KVM host.
>>>>>>>
>>>>>>> Could someone post an actual .config that reproduces this? Rpi4 is
>>>>>>> A72, which both works and doesn't work in Florian's testing, so I'd be
>>>>>>> highly surprised if this is not a config issue.
>>>>>>
>>>>>> shmobile_defconfig with CONFIG_LPAE=y added failed for me before.
>>>>>>
>>>>>> Building multi_v7_defconfig with the above enabled...
>>>>
>>>> And that works, while shmobile_defconfig with the above does not!
>>>> So it is config-related.
>>>
>>> I ran tools/testing/ktest/config-bisect.pl and found the "offending"
>>> config option: CONFIG_CC_OPTIMIZE_FOR_SIZE=y gives a broken
>>> kernel, CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y works.
>>> (verified by just enabling CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in my
>>>    broken config)....
>>
>> I second that, my working configuration has
>> CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y whereas my
>> broken one has CONFIG_CC_OPTIMIZE_FOR_SIZE=y, sure enough, changing the
>> broken configuration to have CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y turns it
>> into a "working" configuration on the Raspberry Pi 4B in AArch32 mode.
>>
>> That makes more sense because BCM7211 and BCM2711 are exactly the same CPU,
>> thanks for restoring sanity!
> 
> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> a "memory" clobber to the asm() instruction in there.
> 

I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:

diff --git a/arch/arm/include/asm/proc-fns.h 
b/arch/arm/include/asm/proc-fns.h
index 9b3105a2a5e0..1087bd2af433 100644
--- a/arch/arm/include/asm/proc-fns.h
+++ b/arch/arm/include/asm/proc-fns.h
@@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)

  static inline void cpu_set_ttbcr(unsigned int ttbcr)
  {
-       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
+       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
  }

  #else  /*!CONFIG_MMU */

my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.

Thanks a lot Russell!
Linus Walleij May 14, 2024, 8:33 p.m. UTC | #26
On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 5/14/24 10:03, Russell King (Oracle) wrote:

> > I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > a "memory" clobber to the asm() instruction in there.
> >
>
> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
>
> diff --git a/arch/arm/include/asm/proc-fns.h
> b/arch/arm/include/asm/proc-fns.h
> index 9b3105a2a5e0..1087bd2af433 100644
> --- a/arch/arm/include/asm/proc-fns.h
> +++ b/arch/arm/include/asm/proc-fns.h
> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
>
>   static inline void cpu_set_ttbcr(unsigned int ttbcr)
>   {
> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
>   }
>
>   #else  /*!CONFIG_MMU */
>
> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
>
> Thanks a lot Russell!

Second that, very nicely pinpointed Russell!

Florian, do you want to send a patch or should I?

Yours,
Linus Walleij
Florian Fainelli May 14, 2024, 8:34 p.m. UTC | #27
On 5/14/24 13:33, Linus Walleij wrote:
> On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>> On 5/14/24 10:03, Russell King (Oracle) wrote:
> 
>>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
>>> a "memory" clobber to the asm() instruction in there.
>>>
>>
>> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
>>
>> diff --git a/arch/arm/include/asm/proc-fns.h
>> b/arch/arm/include/asm/proc-fns.h
>> index 9b3105a2a5e0..1087bd2af433 100644
>> --- a/arch/arm/include/asm/proc-fns.h
>> +++ b/arch/arm/include/asm/proc-fns.h
>> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
>>
>>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
>>    {
>> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
>> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
>>    }
>>
>>    #else  /*!CONFIG_MMU */
>>
>> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
>>
>> Thanks a lot Russell!
> 
> Second that, very nicely pinpointed Russell!
> 
> Florian, do you want to send a patch or should I?

I was wondering if Russell was able to fold this directly into patch #2 
where cpu_set_ttbr() is added, so as to not break functionality across 
bisection.
Geert Uytterhoeven May 15, 2024, 8:10 a.m. UTC | #28
On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > On Tue, May 14, 2024 at 09:54:07AM -0700, Florian Fainelli wrote:
> >> On 5/14/24 09:06, Geert Uytterhoeven wrote:
> >>> On Tue, May 14, 2024 at 1:28 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >>> I ran tools/testing/ktest/config-bisect.pl and found the "offending"
> >>> config option: CONFIG_CC_OPTIMIZE_FOR_SIZE=y gives a broken
> >>> kernel, CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y works.
> >>> (verified by just enabling CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in my
> >>>    broken config)....
> >>
> >> I second that, my working configuration has
> >> CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y whereas my
> >> broken one has CONFIG_CC_OPTIMIZE_FOR_SIZE=y, sure enough, changing the
> >> broken configuration to have CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y turns it
> >> into a "working" configuration on the Raspberry Pi 4B in AArch32 mode.
> >>
> >> That makes more sense because BCM7211 and BCM2711 are exactly the same CPU,
> >> thanks for restoring sanity!
> >
> > I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > a "memory" clobber to the asm() instruction in there.
> >
>
> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
>
> diff --git a/arch/arm/include/asm/proc-fns.h
> b/arch/arm/include/asm/proc-fns.h
> index 9b3105a2a5e0..1087bd2af433 100644
> --- a/arch/arm/include/asm/proc-fns.h
> +++ b/arch/arm/include/asm/proc-fns.h
> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
>
>   static inline void cpu_set_ttbcr(unsigned int ttbcr)
>   {
> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
>   }
>
>   #else  /*!CONFIG_MMU */
>
> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
>
> Thanks a lot Russell!

Unfortunately this does not fix the issue for me.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Ard Biesheuvel May 15, 2024, 8:36 a.m. UTC | #29
On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> On 5/14/24 13:33, Linus Walleij wrote:
> > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> >
> >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> >>> a "memory" clobber to the asm() instruction in there.
> >>>
> >>
> >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> >>
> >> diff --git a/arch/arm/include/asm/proc-fns.h
> >> b/arch/arm/include/asm/proc-fns.h
> >> index 9b3105a2a5e0..1087bd2af433 100644
> >> --- a/arch/arm/include/asm/proc-fns.h
> >> +++ b/arch/arm/include/asm/proc-fns.h
> >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> >>
> >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> >>    {
> >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> >>    }
> >>
> >>    #else  /*!CONFIG_MMU */
> >>
> >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> >>
> >> Thanks a lot Russell!
> >
> > Second that, very nicely pinpointed Russell!
> >
> > Florian, do you want to send a patch or should I?
>
> I was wondering if Russell was able to fold this directly into patch #2
> where cpu_set_ttbr() is added, so as to not break functionality across
> bisection.

Sadly, I can still reproduce this with the above fix.

I included TTBCR in the DEBUG_USER output, and (as expected), it has
A1, EPD0 and T0SZ set to the 'uaccess disabled' values.

Using __always_inline on uaccess_save_and_enable() and
uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
around it. The "memory" clobber seems unnecessary in my case, but it
is needed for correctness in any case.

It is unclear to me why placing these helpers out of line should make
any difference, and I am not convinced it is a problem in the code
(IIRC we've had other issues with -Os in the past)






Run /init as init process
init (1): undefined instruction: pc=0015f450 ttbcr=b5403587
CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-rc1-00037-gb7e329ac0464-dirty #8
Hardware name: Generic DT based system
PC is at 0x15f450
LR is at 0x1548b0
pc : [<0015f450>]    lr : [<001548b0>]    psr: 20000010
sp : bec97da0  ip : bec97ee8  fp : 00000000
r10: 00000000  r9 : 00000000  r8 : 00000000
r7 : 00000000  r6 : bec97f0c  r5 : 00000000  r4 : 00000040
r3 : 00000000  r2 : 00000040  r1 : 00000040  r0 : 00000000
Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
Control: 30c5383d  Table: 427122c0  DAC: fffffffd
Code: bad PC value
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
CPU: 0 PID: 1 Comm: init Not tainted 6.9.0-rc1-00037-gb7e329ac0464-dirty #8
Hardware name: Generic DT based system
Call trace:
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x54/0x68
 dump_stack_lvl from panic+0xf8/0x34c
 panic from do_exit+0x1dc/0x880
 do_exit from sys_exit_group+0x0/0x10
Geert Uytterhoeven May 15, 2024, 8:45 a.m. UTC | #30
Hi Ard,

On Wed, May 15, 2024 at 10:37 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
> > On 5/14/24 13:33, Linus Walleij wrote:
> > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > >
> > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > >>> a "memory" clobber to the asm() instruction in there.
> > >>>
> > >>
> > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > >>
> > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > >> b/arch/arm/include/asm/proc-fns.h
> > >> index 9b3105a2a5e0..1087bd2af433 100644
> > >> --- a/arch/arm/include/asm/proc-fns.h
> > >> +++ b/arch/arm/include/asm/proc-fns.h
> > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > >>
> > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > >>    {
> > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > >>    }
> > >>
> > >>    #else  /*!CONFIG_MMU */
> > >>
> > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > >>
> > >> Thanks a lot Russell!
> > >
> > > Second that, very nicely pinpointed Russell!
> > >
> > > Florian, do you want to send a patch or should I?
> >
> > I was wondering if Russell was able to fold this directly into patch #2
> > where cpu_set_ttbr() is added, so as to not break functionality across
> > bisection.
>
> Sadly, I can still reproduce this with the above fix.
>
> I included TTBCR in the DEBUG_USER output, and (as expected), it has
> A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
>
> Using __always_inline on uaccess_save_and_enable() and
> uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> around it. The "memory" clobber seems unnecessary in my case, but it
> is needed for correctness in any case.
>
> It is unclear to me why placing these helpers out of line should make
> any difference, and I am not convinced it is a problem in the code
> (IIRC we've had other issues with -Os in the past)

Commit 66abdd3b5d4e53bc ("ARM: 9356/2: Move asm statements accessing
TTBCR into C functions") also removed the "volatile" from the mcr
inline asm statement.  I tried adding it back, but that didn't help.

Gr{oetje,eeting}s,

                        Geert
Russell King (Oracle) May 15, 2024, 8:48 a.m. UTC | #31
On Wed, May 15, 2024 at 10:36:48AM +0200, Ard Biesheuvel wrote:
> On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >
> > On 5/14/24 13:33, Linus Walleij wrote:
> > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > >
> > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > >>> a "memory" clobber to the asm() instruction in there.
> > >>>
> > >>
> > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > >>
> > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > >> b/arch/arm/include/asm/proc-fns.h
> > >> index 9b3105a2a5e0..1087bd2af433 100644
> > >> --- a/arch/arm/include/asm/proc-fns.h
> > >> +++ b/arch/arm/include/asm/proc-fns.h
> > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > >>
> > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > >>    {
> > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > >>    }
> > >>
> > >>    #else  /*!CONFIG_MMU */
> > >>
> > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > >>
> > >> Thanks a lot Russell!
> > >
> > > Second that, very nicely pinpointed Russell!
> > >
> > > Florian, do you want to send a patch or should I?
> >
> > I was wondering if Russell was able to fold this directly into patch #2
> > where cpu_set_ttbr() is added, so as to not break functionality across
> > bisection.
> 
> Sadly, I can still reproduce this with the above fix.
> 
> I included TTBCR in the DEBUG_USER output, and (as expected), it has
> A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
> 
> Using __always_inline on uaccess_save_and_enable() and
> uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> around it. The "memory" clobber seems unnecessary in my case, but it
> is needed for correctness in any case.
> 
> It is unclear to me why placing these helpers out of line should make
> any difference, and I am not convinced it is a problem in the code
> (IIRC we've had other issues with -Os in the past)

Time to start comparing compilers / compiler versions?
Ard Biesheuvel May 15, 2024, 8:49 a.m. UTC | #32
On Wed, 15 May 2024 at 10:45, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ard,
>
> On Wed, May 15, 2024 at 10:37 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > On 5/14/24 13:33, Linus Walleij wrote:
> > > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > > >
> > > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > > >>> a "memory" clobber to the asm() instruction in there.
> > > >>>
> > > >>
> > > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > > >>
> > > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > > >> b/arch/arm/include/asm/proc-fns.h
> > > >> index 9b3105a2a5e0..1087bd2af433 100644
> > > >> --- a/arch/arm/include/asm/proc-fns.h
> > > >> +++ b/arch/arm/include/asm/proc-fns.h
> > > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > > >>
> > > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > > >>    {
> > > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > > >>    }
> > > >>
> > > >>    #else  /*!CONFIG_MMU */
> > > >>
> > > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > > >>
> > > >> Thanks a lot Russell!
> > > >
> > > > Second that, very nicely pinpointed Russell!
> > > >
> > > > Florian, do you want to send a patch or should I?
> > >
> > > I was wondering if Russell was able to fold this directly into patch #2
> > > where cpu_set_ttbr() is added, so as to not break functionality across
> > > bisection.
> >
> > Sadly, I can still reproduce this with the above fix.
> >
> > I included TTBCR in the DEBUG_USER output, and (as expected), it has
> > A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
> >
> > Using __always_inline on uaccess_save_and_enable() and
> > uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> > around it. The "memory" clobber seems unnecessary in my case, but it
> > is needed for correctness in any case.
> >
> > It is unclear to me why placing these helpers out of line should make
> > any difference, and I am not convinced it is a problem in the code
> > (IIRC we've had other issues with -Os in the past)
>
> Commit 66abdd3b5d4e53bc ("ARM: 9356/2: Move asm statements accessing
> TTBCR into C functions") also removed the "volatile" from the mcr
> inline asm statement.  I tried adding it back, but that didn't help.
>

Does using __always_inline make any difference in your case?
Ard Biesheuvel May 15, 2024, 8:53 a.m. UTC | #33
On Wed, 15 May 2024 at 10:48, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Wed, May 15, 2024 at 10:36:48AM +0200, Ard Biesheuvel wrote:
> > On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
> > >
> > > On 5/14/24 13:33, Linus Walleij wrote:
> > > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > > >
> > > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > > >>> a "memory" clobber to the asm() instruction in there.
> > > >>>
> > > >>
> > > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > > >>
> > > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > > >> b/arch/arm/include/asm/proc-fns.h
> > > >> index 9b3105a2a5e0..1087bd2af433 100644
> > > >> --- a/arch/arm/include/asm/proc-fns.h
> > > >> +++ b/arch/arm/include/asm/proc-fns.h
> > > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > > >>
> > > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > > >>    {
> > > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > > >>    }
> > > >>
> > > >>    #else  /*!CONFIG_MMU */
> > > >>
> > > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > > >>
> > > >> Thanks a lot Russell!
> > > >
> > > > Second that, very nicely pinpointed Russell!
> > > >
> > > > Florian, do you want to send a patch or should I?
> > >
> > > I was wondering if Russell was able to fold this directly into patch #2
> > > where cpu_set_ttbr() is added, so as to not break functionality across
> > > bisection.
> >
> > Sadly, I can still reproduce this with the above fix.
> >
> > I included TTBCR in the DEBUG_USER output, and (as expected), it has
> > A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
> >
> > Using __always_inline on uaccess_save_and_enable() and
> > uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> > around it. The "memory" clobber seems unnecessary in my case, but it
> > is needed for correctness in any case.
> >
> > It is unclear to me why placing these helpers out of line should make
> > any difference, and I am not convinced it is a problem in the code
> > (IIRC we've had other issues with -Os in the past)
>
> Time to start comparing compilers / compiler versions?
>

I guess.

I'll try to dig a bit deeper this afternoon - AIUI, there is never a
valid reason to return to user space with uaccess disabled, right? So
we should be able to catch that case using a BUG() or similar, and
crash in the kernel rather than in user space when this occurs. That
should make it a bit easier to deal with in the debugger etc.
Geert Uytterhoeven May 15, 2024, 9:21 a.m. UTC | #34
Hi Ard,

On Wed, May 15, 2024 at 10:49 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Wed, 15 May 2024 at 10:45, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, May 15, 2024 at 10:37 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > > On 5/14/24 13:33, Linus Walleij wrote:
> > > > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > > > >
> > > > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > > > >>> a "memory" clobber to the asm() instruction in there.
> > > > >>>
> > > > >>
> > > > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > > > >>
> > > > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > > > >> b/arch/arm/include/asm/proc-fns.h
> > > > >> index 9b3105a2a5e0..1087bd2af433 100644
> > > > >> --- a/arch/arm/include/asm/proc-fns.h
> > > > >> +++ b/arch/arm/include/asm/proc-fns.h
> > > > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > > > >>
> > > > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > > > >>    {
> > > > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > > > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > > > >>    }
> > > > >>
> > > > >>    #else  /*!CONFIG_MMU */
> > > > >>
> > > > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > > > >>
> > > > >> Thanks a lot Russell!
> > > > >
> > > > > Second that, very nicely pinpointed Russell!
> > > > >
> > > > > Florian, do you want to send a patch or should I?
> > > >
> > > > I was wondering if Russell was able to fold this directly into patch #2
> > > > where cpu_set_ttbr() is added, so as to not break functionality across
> > > > bisection.
> > >
> > > Sadly, I can still reproduce this with the above fix.
> > >
> > > I included TTBCR in the DEBUG_USER output, and (as expected), it has
> > > A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
> > >
> > > Using __always_inline on uaccess_save_and_enable() and
> > > uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> > > around it. The "memory" clobber seems unnecessary in my case, but it
> > > is needed for correctness in any case.
> > >
> > > It is unclear to me why placing these helpers out of line should make
> > > any difference, and I am not convinced it is a problem in the code
> > > (IIRC we've had other issues with -Os in the past)
> >
> > Commit 66abdd3b5d4e53bc ("ARM: 9356/2: Move asm statements accessing
> > TTBCR into C functions") also removed the "volatile" from the mcr
> > inline asm statement.  I tried adding it back, but that didn't help.
>
> Does using __always_inline make any difference in your case?

Marking  uaccess_save_and_enable() __always_inline is sufficient
to fix the issue.

gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

Gr{oetje,eeting}s,

                        Geert
Ard Biesheuvel May 15, 2024, 9:39 a.m. UTC | #35
On Wed, 15 May 2024 at 11:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ard,
>
> On Wed, May 15, 2024 at 10:49 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Wed, 15 May 2024 at 10:45, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Wed, May 15, 2024 at 10:37 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > > > On 5/14/24 13:33, Linus Walleij wrote:
> > > > > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > > > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > > > > >
> > > > > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > > > > >>> a "memory" clobber to the asm() instruction in there.
> > > > > >>>
> > > > > >>
> > > > > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > > > > >>
> > > > > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > > > > >> b/arch/arm/include/asm/proc-fns.h
> > > > > >> index 9b3105a2a5e0..1087bd2af433 100644
> > > > > >> --- a/arch/arm/include/asm/proc-fns.h
> > > > > >> +++ b/arch/arm/include/asm/proc-fns.h
> > > > > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > > > > >>
> > > > > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > > > > >>    {
> > > > > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > > > > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > > > > >>    }
> > > > > >>
> > > > > >>    #else  /*!CONFIG_MMU */
> > > > > >>
> > > > > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > > > > >>
> > > > > >> Thanks a lot Russell!
> > > > > >
> > > > > > Second that, very nicely pinpointed Russell!
> > > > > >
> > > > > > Florian, do you want to send a patch or should I?
> > > > >
> > > > > I was wondering if Russell was able to fold this directly into patch #2
> > > > > where cpu_set_ttbr() is added, so as to not break functionality across
> > > > > bisection.
> > > >
> > > > Sadly, I can still reproduce this with the above fix.
> > > >
> > > > I included TTBCR in the DEBUG_USER output, and (as expected), it has
> > > > A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
> > > >
> > > > Using __always_inline on uaccess_save_and_enable() and
> > > > uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> > > > around it. The "memory" clobber seems unnecessary in my case, but it
> > > > is needed for correctness in any case.
> > > >
> > > > It is unclear to me why placing these helpers out of line should make
> > > > any difference, and I am not convinced it is a problem in the code
> > > > (IIRC we've had other issues with -Os in the past)
> > >
> > > Commit 66abdd3b5d4e53bc ("ARM: 9356/2: Move asm statements accessing
> > > TTBCR into C functions") also removed the "volatile" from the mcr
> > > inline asm statement.  I tried adding it back, but that didn't help.
> >
> > Does using __always_inline make any difference in your case?
>
> Marking  uaccess_save_and_enable() __always_inline is sufficient
> to fix the issue.
>
> gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
>

Thanks for confirming.

For the record, I am using

gcc version 13.2.0 (Debian 13.2.0-12)
Linus Walleij May 15, 2024, 11:58 a.m. UTC | #36
On Wed, May 15, 2024 at 11:21 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> Ard:

> > Does using __always_inline make any difference in your case?
>
> Marking  uaccess_save_and_enable() __always_inline is sufficient
> to fix the issue.
>
> gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

Can you make a combined patch with __always_inline and the
"memory" clobber both and send so Russell has something he
can fold in to stabilize his tree? Like a "this makes my machine
work" patch.

I'm using
$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (Arm GNU Toolchain 12.2.MPACBTI-Bet1 (Build
arm-12-mpacbti.16)) 12.2.0

I'm almost always using LLVM in parallel as well, also for this work,
because I work on CFI and forget to switch compiler between different
projects all of the time.

Yours,
Linus Walleij
Russell King (Oracle) May 15, 2024, 12:27 p.m. UTC | #37
On Wed, May 15, 2024 at 10:53:47AM +0200, Ard Biesheuvel wrote:
> On Wed, 15 May 2024 at 10:48, Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, May 15, 2024 at 10:36:48AM +0200, Ard Biesheuvel wrote:
> > > On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > >
> > > > On 5/14/24 13:33, Linus Walleij wrote:
> > > > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > > > >
> > > > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > > > >>> a "memory" clobber to the asm() instruction in there.
> > > > >>>
> > > > >>
> > > > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > > > >>
> > > > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > > > >> b/arch/arm/include/asm/proc-fns.h
> > > > >> index 9b3105a2a5e0..1087bd2af433 100644
> > > > >> --- a/arch/arm/include/asm/proc-fns.h
> > > > >> +++ b/arch/arm/include/asm/proc-fns.h
> > > > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > > > >>
> > > > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > > > >>    {
> > > > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > > > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > > > >>    }
> > > > >>
> > > > >>    #else  /*!CONFIG_MMU */
> > > > >>
> > > > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > > > >>
> > > > >> Thanks a lot Russell!
> > > > >
> > > > > Second that, very nicely pinpointed Russell!
> > > > >
> > > > > Florian, do you want to send a patch or should I?
> > > >
> > > > I was wondering if Russell was able to fold this directly into patch #2
> > > > where cpu_set_ttbr() is added, so as to not break functionality across
> > > > bisection.
> > >
> > > Sadly, I can still reproduce this with the above fix.
> > >
> > > I included TTBCR in the DEBUG_USER output, and (as expected), it has
> > > A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
> > >
> > > Using __always_inline on uaccess_save_and_enable() and
> > > uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> > > around it. The "memory" clobber seems unnecessary in my case, but it
> > > is needed for correctness in any case.
> > >
> > > It is unclear to me why placing these helpers out of line should make
> > > any difference, and I am not convinced it is a problem in the code
> > > (IIRC we've had other issues with -Os in the past)
> >
> > Time to start comparing compilers / compiler versions?
> >
> 
> I guess.
> 
> I'll try to dig a bit deeper this afternoon - AIUI, there is never a
> valid reason to return to user space with uaccess disabled, right? So
> we should be able to catch that case using a BUG() or similar, and
> crash in the kernel rather than in user space when this occurs. That
> should make it a bit easier to deal with in the debugger etc.

That _should_ be handled on the kernel exit to userspace path. The
kernel itself should be running with user access disabled except for
the explicit accesses.
Geert Uytterhoeven May 15, 2024, 2:05 p.m. UTC | #38
Hi Linus,

On Wed, May 15, 2024 at 1:58 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> On Wed, May 15, 2024 at 11:21 AM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > Ard:
>
> > > Does using __always_inline make any difference in your case?
> >
> > Marking  uaccess_save_and_enable() __always_inline is sufficient
> > to fix the issue.
> >
> > gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
>
> Can you make a combined patch with __always_inline and the
> "memory" clobber both and send so Russell has something he
> can fold in to stabilize his tree? Like a "this makes my machine
> work" patch.

Done, also submitted to rmk's patch tracker:
https://lore.kernel.org/all/200d273a83906a68a1c4a9298c415980737be811.1715781469.git.geert+renesas@glider.be/
https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9398/1

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Ard Biesheuvel May 15, 2024, 3:41 p.m. UTC | #39
On Wed, 15 May 2024 at 14:27, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Wed, May 15, 2024 at 10:53:47AM +0200, Ard Biesheuvel wrote:
> > On Wed, 15 May 2024 at 10:48, Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > >
> > > On Wed, May 15, 2024 at 10:36:48AM +0200, Ard Biesheuvel wrote:
> > > > On Tue, 14 May 2024 at 22:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > > >
> > > > > On 5/14/24 13:33, Linus Walleij wrote:
> > > > > > On Tue, May 14, 2024 at 8:26 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > > > > >> On 5/14/24 10:03, Russell King (Oracle) wrote:
> > > > > >
> > > > > >>> I would imagine that the problem is cpu_set_ttbcr(). Please try adding
> > > > > >>> a "memory" clobber to the asm() instruction in there.
> > > > > >>>
> > > > > >>
> > > > > >> I can confirm that with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the hunk below:
> > > > > >>
> > > > > >> diff --git a/arch/arm/include/asm/proc-fns.h
> > > > > >> b/arch/arm/include/asm/proc-fns.h
> > > > > >> index 9b3105a2a5e0..1087bd2af433 100644
> > > > > >> --- a/arch/arm/include/asm/proc-fns.h
> > > > > >> +++ b/arch/arm/include/asm/proc-fns.h
> > > > > >> @@ -187,7 +187,7 @@ static inline unsigned int cpu_get_ttbcr(void)
> > > > > >>
> > > > > >>    static inline void cpu_set_ttbcr(unsigned int ttbcr)
> > > > > >>    {
> > > > > >> -       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr));
> > > > > >> +       asm("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr) : "memory");
> > > > > >>    }
> > > > > >>
> > > > > >>    #else  /*!CONFIG_MMU */
> > > > > >>
> > > > > >> my Raspberry Pi 4B in AArch32 mode boots and runs user-space properly.
> > > > > >>
> > > > > >> Thanks a lot Russell!
> > > > > >
> > > > > > Second that, very nicely pinpointed Russell!
> > > > > >
> > > > > > Florian, do you want to send a patch or should I?
> > > > >
> > > > > I was wondering if Russell was able to fold this directly into patch #2
> > > > > where cpu_set_ttbr() is added, so as to not break functionality across
> > > > > bisection.
> > > >
> > > > Sadly, I can still reproduce this with the above fix.
> > > >
> > > > I included TTBCR in the DEBUG_USER output, and (as expected), it has
> > > > A1, EPD0 and T0SZ set to the 'uaccess disabled' values.
> > > >
> > > > Using __always_inline on uaccess_save_and_enable() and
> > > > uaccess_restore() (as the CONFIG_CPU_SW_DOMAIN_PAN does) seems to work
> > > > around it. The "memory" clobber seems unnecessary in my case, but it
> > > > is needed for correctness in any case.
> > > >
> > > > It is unclear to me why placing these helpers out of line should make
> > > > any difference, and I am not convinced it is a problem in the code
> > > > (IIRC we've had other issues with -Os in the past)
> > >
> > > Time to start comparing compilers / compiler versions?
> > >
> >
> > I guess.
> >
> > I'll try to dig a bit deeper this afternoon - AIUI, there is never a
> > valid reason to return to user space with uaccess disabled, right? So
> > we should be able to catch that case using a BUG() or similar, and
> > crash in the kernel rather than in user space when this occurs. That
> > should make it a bit easier to deal with in the debugger etc.
>
> That _should_ be handled on the kernel exit to userspace path. The
> kernel itself should be running with user access disabled except for
> the explicit accesses.
>

This definitely looks like some pathological compiler behavior:

The faulting instruction in question is in busybox

   15f450:       eef11a10        vmrs    r1, fpscr

which [as expected] triggers an UNDEF exception as FP is disabled
after a context switch.

The get_user() call in do_undefinstr() [arch/arm/kernel/traps.c:482]

    if (get_user(instr, (u32 __user *)pc))

gets compiled to the below, where the thing to note is that the
out-of-line version of uaccess_save_and_enable() returns the old ttbcr
value in R0. This value gets recorded in R3, but it also gets happily
passed on to __get_user_4(), which expects the user address in R0, not
the old value of TTBCR.

│    0xc040a5e4 <do_undefinstr+228>  bl      0xc0409fe4
<uaccess_save_and_enable>
│    0xc040a5e8 <do_undefinstr+232>  mov     r3, r0
│  > 0xc040a5ec <do_undefinstr+236>  bl      0xc10443a8 <__get_user_4>
│    0xc040a5f0 <do_undefinstr+240>  mcr     15, 0, r3, cr2, cr0, {2}
│    0xc040a5f4 <do_undefinstr+244>  isb     sy

With __always_inline, this part is emitted as

 5fc:   ee124f50        mrc     15, 0, r4, cr2, cr0, {2}
 600:   e0033004        and     r3, r3, r4
 604:   ee023f50        mcr     15, 0, r3, cr2, cr0, {2}
 608:   f57ff06f        isb     sy
 60c:   ebfffffe        bl      0 <__get_user_4>
                        60c: R_ARM_CALL __get_user_4
 610:   ee024f50        mcr     15, 0, r4, cr2, cr0, {2}
 614:   f57ff06f        isb     sy

and so R0 is preserved, and the issue does not happen.

Not sure how to reduce this to a reproducer that can be used to report
the issue to the GCC folks, but it is most definitely a compiler
problem, as far as I can tell.
Russell King (Oracle) May 15, 2024, 4:18 p.m. UTC | #40
On Wed, May 15, 2024 at 05:41:45PM +0200, Ard Biesheuvel wrote:
> This definitely looks like some pathological compiler behavior:
> 
> The faulting instruction in question is in busybox
> 
>    15f450:       eef11a10        vmrs    r1, fpscr
> 
> which [as expected] triggers an UNDEF exception as FP is disabled
> after a context switch.
> 
> The get_user() call in do_undefinstr() [arch/arm/kernel/traps.c:482]
> 
>     if (get_user(instr, (u32 __user *)pc))
> 
> gets compiled to the below, where the thing to note is that the
> out-of-line version of uaccess_save_and_enable() returns the old ttbcr
> value in R0. This value gets recorded in R3, but it also gets happily
> passed on to __get_user_4(), which expects the user address in R0, not
> the old value of TTBCR.
> 
> │    0xc040a5e4 <do_undefinstr+228>  bl      0xc0409fe4
> <uaccess_save_and_enable>
> │    0xc040a5e8 <do_undefinstr+232>  mov     r3, r0
> │  > 0xc040a5ec <do_undefinstr+236>  bl      0xc10443a8 <__get_user_4>
> │    0xc040a5f0 <do_undefinstr+240>  mcr     15, 0, r3, cr2, cr0, {2}
> │    0xc040a5f4 <do_undefinstr+244>  isb     sy
> 
> With __always_inline, this part is emitted as
> 
>  5fc:   ee124f50        mrc     15, 0, r4, cr2, cr0, {2}
>  600:   e0033004        and     r3, r3, r4
>  604:   ee023f50        mcr     15, 0, r3, cr2, cr0, {2}
>  608:   f57ff06f        isb     sy
>  60c:   ebfffffe        bl      0 <__get_user_4>
>                         60c: R_ARM_CALL __get_user_4
>  610:   ee024f50        mcr     15, 0, r4, cr2, cr0, {2}
>  614:   f57ff06f        isb     sy
> 
> and so R0 is preserved, and the issue does not happen.
> 
> Not sure how to reduce this to a reproducer that can be used to report
> the issue to the GCC folks, but it is most definitely a compiler
> problem, as far as I can tell.

Well this has come up before...

commit 851140ab0d083c78e5723a8b1cbd258f567a7aff
Author: Masahiro Yamada <yamada.masahiro@socionext.com>
Date:   Wed Oct 2 11:28:02 2019 +0100

    ARM: 8908/1: add __always_inline to functions called from __get_user_check()

I assume it's a wontfix on the GCC side.
Ard Biesheuvel May 15, 2024, 4:36 p.m. UTC | #41
On Wed, 15 May 2024 at 18:18, Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Wed, May 15, 2024 at 05:41:45PM +0200, Ard Biesheuvel wrote:
> > This definitely looks like some pathological compiler behavior:
> >
> > The faulting instruction in question is in busybox
> >
> >    15f450:       eef11a10        vmrs    r1, fpscr
> >
> > which [as expected] triggers an UNDEF exception as FP is disabled
> > after a context switch.
> >
> > The get_user() call in do_undefinstr() [arch/arm/kernel/traps.c:482]
> >
> >     if (get_user(instr, (u32 __user *)pc))
> >
> > gets compiled to the below, where the thing to note is that the
> > out-of-line version of uaccess_save_and_enable() returns the old ttbcr
> > value in R0. This value gets recorded in R3, but it also gets happily
> > passed on to __get_user_4(), which expects the user address in R0, not
> > the old value of TTBCR.
> >
> > │    0xc040a5e4 <do_undefinstr+228>  bl      0xc0409fe4
> > <uaccess_save_and_enable>
> > │    0xc040a5e8 <do_undefinstr+232>  mov     r3, r0
> > │  > 0xc040a5ec <do_undefinstr+236>  bl      0xc10443a8 <__get_user_4>
> > │    0xc040a5f0 <do_undefinstr+240>  mcr     15, 0, r3, cr2, cr0, {2}
> > │    0xc040a5f4 <do_undefinstr+244>  isb     sy
> >
> > With __always_inline, this part is emitted as
> >
> >  5fc:   ee124f50        mrc     15, 0, r4, cr2, cr0, {2}
> >  600:   e0033004        and     r3, r3, r4
> >  604:   ee023f50        mcr     15, 0, r3, cr2, cr0, {2}
> >  608:   f57ff06f        isb     sy
> >  60c:   ebfffffe        bl      0 <__get_user_4>
> >                         60c: R_ARM_CALL __get_user_4
> >  610:   ee024f50        mcr     15, 0, r4, cr2, cr0, {2}
> >  614:   f57ff06f        isb     sy
> >
> > and so R0 is preserved, and the issue does not happen.
> >
> > Not sure how to reduce this to a reproducer that can be used to report
> > the issue to the GCC folks, but it is most definitely a compiler
> > problem, as far as I can tell.
>
> Well this has come up before...
>
> commit 851140ab0d083c78e5723a8b1cbd258f567a7aff
> Author: Masahiro Yamada <yamada.masahiro@socionext.com>
> Date:   Wed Oct 2 11:28:02 2019 +0100
>
>     ARM: 8908/1: add __always_inline to functions called from __get_user_check()
>
> I assume it's a wontfix on the GCC side.
>

Yes, that is the exact same issue.

Not sure whether it has been reported - the GCC side might not even be aware.

I managed to reproduce this in godbolt - it happens even with -O2 not
just with -Os

https://godbolt.org/z/do56voKsa

As far as I can tell, functions that use asm("r#") cannot safely call
other functions at all unless those are __always_inline. The fact that
it triggers with -Os first is just because it inlines much less
aggressively than other optimization levels.
Arnd Bergmann May 15, 2024, 9:51 p.m. UTC | #42
On Wed, May 15, 2024, at 16:36, Ard Biesheuvel wrote:
> On Wed, 15 May 2024 at 18:18, Russell King (Oracle) <linux@armlinux.org.uk> wrote:
>> > and so R0 is preserved, and the issue does not happen.
>> >
>> > Not sure how to reduce this to a reproducer that can be used to report
>> > the issue to the GCC folks, but it is most definitely a compiler
>> > problem, as far as I can tell.
>>
>> Well this has come up before...
>>
>> commit 851140ab0d083c78e5723a8b1cbd258f567a7aff
>> Author: Masahiro Yamada <yamada.masahiro@socionext.com>
>> Date:   Wed Oct 2 11:28:02 2019 +0100
>>
>>     ARM: 8908/1: add __always_inline to functions called from __get_user_check()
>>
>> I assume it's a wontfix on the GCC side.
>>
>
> Yes, that is the exact same issue.
>
> Not sure whether it has been reported - the GCC side might not even be aware.
>
> I managed to reproduce this in godbolt - it happens even with -O2 not
> just with -Os
>
> https://godbolt.org/z/do56voKsa
>
> As far as I can tell, functions that use asm("r#") cannot safely call
> other functions at all unless those are __always_inline. The fact that
> it triggers with -Os first is just because it inlines much less
> aggressively than other optimization levels.

I remember that one now, and I also think this was a wontfix,
since the compiler has no way of doing the right thing here when
a variable is forced into an argument register and then used
after a function call. Spilling the contents on the stack or another
register would break the 'asm("%r0") annotation.

It would be nice to get a compiler warning for code like this, but
I don't know what legitimate use cases that would warn for, e.g. it
may be indistinguishable from the global 'register unsigned long
current_stack_pointer asm("r1");' construct we have on a couple
of architectures.

     Arnd
diff mbox series

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0af6709570d1..3d97a15a3e2d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1231,9 +1231,9 @@  config HIGHPTE
 	  consumed by page tables.  Setting this option will allow
 	  user-space 2nd level page tables to reside in high memory.
 
-config CPU_SW_DOMAIN_PAN
-	bool "Enable use of CPU domains to implement privileged no-access"
-	depends on MMU && !ARM_LPAE
+config ARM_PAN
+	bool "Enable privileged no-access"
+	depends on MMU
 	default y
 	help
 	  Increase kernel security by ensuring that normal kernel accesses
@@ -1242,10 +1242,26 @@  config CPU_SW_DOMAIN_PAN
 	  by ensuring that magic values (such as LIST_POISON) will always
 	  fault when dereferenced.
 
+	  The implementation uses CPU domains when !CONFIG_ARM_LPAE and
+	  disabling of TTBR0 page table walks with CONFIG_ARM_LPAE.
+
+config CPU_SW_DOMAIN_PAN
+	def_bool y
+	depends on ARM_PAN && !ARM_LPAE
+	help
+	  Enable use of CPU domains to implement privileged no-access.
+
 	  CPUs with low-vector mappings use a best-efforts implementation.
 	  Their lower 1MB needs to remain accessible for the vectors, but
 	  the remainder of userspace will become appropriately inaccessible.
 
+config CPU_TTBR0_PAN
+	def_bool y
+	depends on ARM_PAN && ARM_LPAE
+	help
+	  Enable privileged no-access by disabling TTBR0 page table walks when
+	  running in kernel mode.
+
 config HW_PERF_EVENTS
 	def_bool y
 	depends on ARM_PMU
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index aebe2c8f6a68..d33c1e24e00b 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -21,6 +21,7 @@ 
 #include <asm/opcodes-virt.h>
 #include <asm/asm-offsets.h>
 #include <asm/page.h>
+#include <asm/pgtable.h>
 #include <asm/thread_info.h>
 #include <asm/uaccess-asm.h>
 
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
index 19da7753a0b8..323ad811732e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -74,6 +74,7 @@ 
 #define PHYS_MASK_SHIFT		(40)
 #define PHYS_MASK		((1ULL << PHYS_MASK_SHIFT) - 1)
 
+#ifndef CONFIG_CPU_TTBR0_PAN
 /*
  * TTBR0/TTBR1 split (PAGE_OFFSET):
  *   0x40000000: T0SZ = 2, T1SZ = 0 (not used)
@@ -93,6 +94,14 @@ 
 #endif
 
 #define TTBR1_SIZE	(((PAGE_OFFSET >> 30) - 1) << 16)
+#else
+/*
+ * With CONFIG_CPU_TTBR0_PAN enabled, TTBR1 is only used during uaccess
+ * disabled regions when TTBR0 is disabled.
+ */
+#define TTBR1_OFFSET	0			/* pointing to swapper_pg_dir */
+#define TTBR1_SIZE	0			/* TTBR1 size controlled via TTBCR.T0SZ */
+#endif
 
 /*
  * TTBCR register bits.
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index 7f44e88d1f25..f064252498c7 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -19,6 +19,7 @@  struct pt_regs {
 struct svc_pt_regs {
 	struct pt_regs regs;
 	u32 dacr;
+	u32 ttbcr;
 };
 
 #define to_svc_pt_regs(r) container_of(r, struct svc_pt_regs, regs)
diff --git a/arch/arm/include/asm/uaccess-asm.h b/arch/arm/include/asm/uaccess-asm.h
index ea42ba25920f..4bccd895d954 100644
--- a/arch/arm/include/asm/uaccess-asm.h
+++ b/arch/arm/include/asm/uaccess-asm.h
@@ -39,7 +39,7 @@ 
 #endif
 	.endm
 
-#ifdef CONFIG_CPU_SW_DOMAIN_PAN
+#if defined(CONFIG_CPU_SW_DOMAIN_PAN)
 
 	.macro	uaccess_disable, tmp, isb=1
 	/*
@@ -65,6 +65,37 @@ 
 	.endif
 	.endm
 
+#elif defined(CONFIG_CPU_TTBR0_PAN)
+
+	.macro	uaccess_disable, tmp, isb=1
+	/*
+	 * Disable TTBR0 page table walks (EDP0 = 1), use the reserved ASID
+	 * from TTBR1 (A1 = 1) and enable TTBR1 page table walks for kernel
+	 * addresses by reducing TTBR0 range to 32MB (T0SZ = 7).
+	 */
+	mrc	p15, 0, \tmp, c2, c0, 2		@ read TTBCR
+	orr	\tmp, \tmp, #TTBCR_EPD0 | TTBCR_T0SZ_MASK
+	orr	\tmp, \tmp, #TTBCR_A1
+	mcr	p15, 0, \tmp, c2, c0, 2		@ write TTBCR
+	.if	\isb
+	instr_sync
+	.endif
+	.endm
+
+	.macro	uaccess_enable, tmp, isb=1
+	/*
+	 * Enable TTBR0 page table walks (T0SZ = 0, EDP0 = 0) and ASID from
+	 * TTBR0 (A1 = 0).
+	 */
+	mrc	p15, 0, \tmp, c2, c0, 2		@ read TTBCR
+	bic	\tmp, \tmp, #TTBCR_EPD0 | TTBCR_T0SZ_MASK
+	bic	\tmp, \tmp, #TTBCR_A1
+	mcr	p15, 0, \tmp, c2, c0, 2		@ write TTBCR
+	.if	\isb
+	instr_sync
+	.endif
+	.endm
+
 #else
 
 	.macro	uaccess_disable, tmp, isb=1
@@ -79,6 +110,12 @@ 
 #define DACR(x...)	x
 #else
 #define DACR(x...)
+#endif
+
+#ifdef CONFIG_CPU_TTBR0_PAN
+#define PAN(x...)	x
+#else
+#define PAN(x...)
 #endif
 
 	/*
@@ -94,6 +131,8 @@ 
 	.macro	uaccess_entry, tsk, tmp0, tmp1, tmp2, disable
  DACR(	mrc	p15, 0, \tmp0, c3, c0, 0)
  DACR(	str	\tmp0, [sp, #SVC_DACR])
+ PAN(	mrc	p15, 0, \tmp0, c2, c0, 2)
+ PAN(	str	\tmp0, [sp, #SVC_TTBCR])
 	.if \disable && IS_ENABLED(CONFIG_CPU_SW_DOMAIN_PAN)
 	/* kernel=client, user=no access */
 	mov	\tmp2, #DACR_UACCESS_DISABLE
@@ -112,8 +151,11 @@ 
 	.macro	uaccess_exit, tsk, tmp0, tmp1
  DACR(	ldr	\tmp0, [sp, #SVC_DACR])
  DACR(	mcr	p15, 0, \tmp0, c3, c0, 0)
+ PAN(	ldr	\tmp0, [sp, #SVC_TTBCR])
+ PAN(	mcr	p15, 0, \tmp0, c2, c0, 2)
 	.endm
 
 #undef DACR
+#undef PAN
 
 #endif /* __ASM_UACCESS_ASM_H__ */
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 2278769f1156..25d21d7d6e3e 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -14,6 +14,8 @@ 
 #include <asm/domain.h>
 #include <asm/unaligned.h>
 #include <asm/unified.h>
+#include <asm/pgtable.h>
+#include <asm/proc-fns.h>
 #include <asm/compiler.h>
 
 #include <asm/extable.h>
@@ -24,7 +26,7 @@ 
  * perform such accesses (eg, via list poison values) which could then
  * be exploited for priviledge escalation.
  */
-#ifdef CONFIG_CPU_SW_DOMAIN_PAN
+#if defined(CONFIG_CPU_SW_DOMAIN_PAN)
 
 static __always_inline unsigned int uaccess_save_and_enable(void)
 {
@@ -43,6 +45,28 @@  static __always_inline void uaccess_restore(unsigned int flags)
 	set_domain(flags);
 }
 
+#elif defined(CONFIG_CPU_TTBR0_PAN)
+
+static inline unsigned int uaccess_save_and_enable(void)
+{
+	unsigned int old_ttbcr = cpu_get_ttbcr();
+
+	/*
+	 * Enable TTBR0 page table walks (T0SZ = 0, EDP0 = 0) and ASID from
+	 * TTBR0 (A1 = 0).
+	 */
+	cpu_set_ttbcr(old_ttbcr & ~(TTBCR_A1 | TTBCR_EPD0 | TTBCR_T0SZ_MASK));
+	isb();
+
+	return old_ttbcr;
+}
+
+static inline void uaccess_restore(unsigned int flags)
+{
+	cpu_set_ttbcr(flags);
+	isb();
+}
+
 #else
 
 static inline unsigned int uaccess_save_and_enable(void)
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 219cbc7e5d13..dd2567ba987f 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -83,6 +83,7 @@  int main(void)
   DEFINE(S_OLD_R0,		offsetof(struct pt_regs, ARM_ORIG_r0));
   DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
   DEFINE(SVC_DACR,		offsetof(struct svc_pt_regs, dacr));
+  DEFINE(SVC_TTBCR,		offsetof(struct svc_pt_regs, ttbcr));
   DEFINE(SVC_REGS_SIZE,		sizeof(struct svc_pt_regs));
   BLANK();
   DEFINE(SIGFRAME_RC3_OFFSET,	offsetof(struct sigframe, retcode[3]));
diff --git a/arch/arm/kernel/suspend.c b/arch/arm/kernel/suspend.c
index c3ec3861dd07..58a6441b58c4 100644
--- a/arch/arm/kernel/suspend.c
+++ b/arch/arm/kernel/suspend.c
@@ -12,6 +12,7 @@ 
 #include <asm/smp_plat.h>
 #include <asm/suspend.h>
 #include <asm/tlbflush.h>
+#include <asm/uaccess.h>
 
 extern int __cpu_suspend(unsigned long, int (*)(unsigned long), u32 cpuid);
 extern void cpu_resume_mmu(void);
@@ -26,6 +27,13 @@  int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 	if (!idmap_pgd)
 		return -EINVAL;
 
+	/*
+	 * Needed for the MMU disabling/enabing code to be able to run from
+	 * TTBR0 addresses.
+	 */
+	if (IS_ENABLED(CONFIG_CPU_TTBR0_PAN))
+		uaccess_save_and_enable();
+
 	/*
 	 * Function graph tracer state gets incosistent when the kernel
 	 * calls functions that never return (aka suspend finishers) hence
diff --git a/arch/arm/lib/csumpartialcopyuser.S b/arch/arm/lib/csumpartialcopyuser.S
index 6928781e6bee..c289bde04743 100644
--- a/arch/arm/lib/csumpartialcopyuser.S
+++ b/arch/arm/lib/csumpartialcopyuser.S
@@ -13,7 +13,8 @@ 
 
 		.text
 
-#ifdef CONFIG_CPU_SW_DOMAIN_PAN
+#if defined(CONFIG_CPU_SW_DOMAIN_PAN)
+
 		.macro	save_regs
 		mrc	p15, 0, ip, c3, c0, 0
 		stmfd	sp!, {r1, r2, r4 - r8, ip, lr}
@@ -25,7 +26,23 @@ 
 		mcr	p15, 0, ip, c3, c0, 0
 		ret	lr
 		.endm
+
+#elif defined(CONFIG_CPU_TTBR0_PAN)
+
+		.macro	save_regs
+		mrc	p15, 0, ip, c2, c0, 2		@ read TTBCR
+		stmfd	sp!, {r1, r2, r4 - r8, ip, lr}
+		uaccess_enable ip
+		.endm
+
+		.macro	load_regs
+		ldmfd	sp!, {r1, r2, r4 - r8, ip, lr}
+		mcr	p15, 0, ip, c2, c0, 2		@ restore TTBCR
+		ret	lr
+		.endm
+
 #else
+
 		.macro	save_regs
 		stmfd	sp!, {r1, r2, r4 - r8, lr}
 		.endm
@@ -33,6 +50,7 @@ 
 		.macro	load_regs
 		ldmfd	sp!, {r1, r2, r4 - r8, pc}
 		.endm
+
 #endif
 
 		.macro	load1b,	reg1
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index e96fb40b9cc3..7d262a819ad1 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -235,6 +235,27 @@  static inline bool is_permission_fault(unsigned int fsr)
 	return false;
 }
 
+#ifdef CONFIG_CPU_TTBR0_PAN
+static inline bool ttbr0_usermode_access_allowed(struct pt_regs *regs)
+{
+	struct svc_pt_regs *svcregs;
+
+	/* If we are in user mode: permission granted */
+	if (user_mode(regs))
+		return true;
+
+	/* uaccess state saved above pt_regs on SVC exception entry */
+	svcregs = to_svc_pt_regs(regs);
+
+	return !(svcregs->ttbcr & TTBCR_EPD0);
+}
+#else
+static inline bool ttbr0_usermode_access_allowed(struct pt_regs *regs)
+{
+	return true;
+}
+#endif
+
 static int __kprobes
 do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 {
@@ -278,6 +299,14 @@  do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
+	/*
+	 * Privileged access aborts with CONFIG_CPU_TTBR0_PAN enabled are
+	 * routed via the translation fault mechanism. Check whether uaccess
+	 * is disabled while in kernel mode.
+	 */
+	if (!ttbr0_usermode_access_allowed(regs))
+		goto no_context;
+
 	if (!(flags & FAULT_FLAG_USER))
 		goto lock_mmap;