Message ID | 20240322195418.2160164-1-jcmvbkbc@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | binfmt_elf_fdpic: fix /proc/<pid>/auxv | expand |
On Fri, 22 Mar 2024 12:54:18 -0700, Max Filippov wrote: > Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are > empty because there's no code that initializes mm->saved_auxv in the > FDPIC ELF loader. > > Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry > aux vector copying to userspace with initialization of mm->saved_auxv > first and then copying it to userspace as a whole. > > [...] Applied to for-next/execve, thanks! [1/1] binfmt_elf_fdpic: fix /proc/<pid>/auxv https://git.kernel.org/kees/c/10e29251be0e Take care,
Hi Max, On 23/3/24 05:54, Max Filippov wrote: > Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are > empty because there's no code that initializes mm->saved_auxv in the > FDPIC ELF loader. > > Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry > aux vector copying to userspace with initialization of mm->saved_auxv > first and then copying it to userspace as a whole. > > Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> This is breaking ARM nommu builds supporting fdpic and elf binaries for me. Tests I have for m68k and riscv nommu setups running elf binaries don't show any problems - I am only seeing this on ARM. ... Freeing unused kernel image (initmem) memory: 472K This architecture does not have kernel memory protection. Run /init as init process Internal error: Oops - undefined instruction: 0 [#1] ARM Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 6.10.0 #1 Hardware name: ARM-Versatile (Device Tree Support) PC is at load_elf_fdpic_binary+0xb34/0xb80 LR is at 0x0 pc : [<00109ce8>] lr : [<00000000>] psr: 80000153 sp : 00823e40 ip : 00000000 fp : 00b8fee4 r10: 009c9b80 r9 : 00b8ff80 r8 : 009ee800 r7 : 00000000 r6 : 009f7e80 r5 : 00b8fedc r4 : 00b87000 r3 : 00b8fed8 r2 : 00b8fee0 r1 : 00b87128 r0 : 00b8fef0 Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none Control: 00091176 Table: 00000000 DAC: 00000000 Register r0 information: non-slab/vmalloc memory Register r1 information: slab/vmalloc mm_struct start 00b87000 pointer offset 296 size 428 Register r2 information: non-slab/vmalloc memory Register r3 information: non-slab/vmalloc memory Register r4 information: slab/vmalloc mm_struct start 00b87000 pointer offset 0 size 428 Register r5 information: non-slab/vmalloc memory Register r6 information: slab/vmalloc kmalloc-32 start 009f7e80 pointer offset 0 size 32 Register r7 information: non-slab/vmalloc memory Register r8 information: slab/vmalloc kmalloc-512 start 009ee800 pointer offset 0 size 512 Register r9 information: non-slab/vmalloc memory Register r10 information: slab/vmalloc kmalloc-128 start 009c9b80 pointer offset 0 size 128 Register r11 information: non-slab/vmalloc memory Register r12 information: non-slab/vmalloc memory Process init (pid: 1, stack limit = 0x(ptrval)) Stack: (0x00823e40 to 0x00824000) 3e40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3e60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3e80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3ea0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3ec0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3ee0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3f00: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3f20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3f40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3f60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3f80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3fa0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? Call trace: load_elf_fdpic_binary from bprm_execve+0x1b4/0x488 bprm_execve from kernel_execve+0x154/0x1e4 kernel_execve from kernel_init+0x4c/0x108 kernel_init from ret_from_fork+0x14/0x38 Exception stack(0x00823fb0 to 0x00823ff8) 3fa0: ???????? ???????? ???????? ???????? 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? Code: bad PC value ---[ end trace 0000000000000000 ]--- note: init[1] exited with irqs disabled Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- The code around that PC is: 109cd0: e2833ff1 add r3, r3, #964 @ 0x3c4 109cd4: e5933000 ldr r3, [r3] 109cd8: e5933328 ldr r3, [r3, #808] @ 0x328 109cdc: e5933084 ldr r3, [r3, #132] @ 0x84 109ce0: e5843034 str r3, [r4, #52] @ 0x34 109ce4: eafffdbc b 1093dc <load_elf_fdpic_binary+0x228> 109ce8: e7f001f2 .word 0xe7f001f2 109cec: eb09471d bl 35b968 <__stack_chk_fail> 109cf0: e59f0038 ldr r0, [pc, #56] @ 109d30 <load_elf_fdpic_binary+0xb7c> 109cf4: eb092f03 bl 355908 <_printk> 109cf8: eafffdb7 b 1093dc <load_elf_fdpic_binary+0x228> Reverting just this change gets it working again. Any idea what might be going on? Regards Greg > --- > fs/binfmt_elf_fdpic.c | 88 +++++++++++++++++++++---------------------- > 1 file changed, 42 insertions(+), 46 deletions(-) > > diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c > index fefc642541cb..7b4542a0cbe3 100644 > --- a/fs/binfmt_elf_fdpic.c > +++ b/fs/binfmt_elf_fdpic.c > @@ -505,8 +505,9 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, > char *k_platform, *k_base_platform; > char __user *u_platform, *u_base_platform, *p; > int loop; > - int nr; /* reset for each csp adjustment */ > unsigned long flags = 0; > + int ei_index; > + elf_addr_t *elf_info; > > #ifdef CONFIG_MMU > /* In some cases (e.g. Hyper-Threading), we want to avoid L1 evictions > @@ -601,44 +602,24 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, > csp -= sp & 15UL; > sp -= sp & 15UL; > > - /* put the ELF interpreter info on the stack */ > -#define NEW_AUX_ENT(id, val) \ > - do { \ > - struct { unsigned long _id, _val; } __user *ent, v; \ > - \ > - ent = (void __user *) csp; \ > - v._id = (id); \ > - v._val = (val); \ > - if (copy_to_user(ent + nr, &v, sizeof(v))) \ > - return -EFAULT; \ > - nr++; \ > + /* Create the ELF interpreter info */ > + elf_info = (elf_addr_t *)mm->saved_auxv; > + /* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */ > +#define NEW_AUX_ENT(id, val) \ > + do { \ > + *elf_info++ = id; \ > + *elf_info++ = val; \ > } while (0) > > - nr = 0; > - csp -= 2 * sizeof(unsigned long); > - NEW_AUX_ENT(AT_NULL, 0); > - if (k_platform) { > - nr = 0; > - csp -= 2 * sizeof(unsigned long); > - NEW_AUX_ENT(AT_PLATFORM, > - (elf_addr_t) (unsigned long) u_platform); > - } > - > - if (k_base_platform) { > - nr = 0; > - csp -= 2 * sizeof(unsigned long); > - NEW_AUX_ENT(AT_BASE_PLATFORM, > - (elf_addr_t) (unsigned long) u_base_platform); > - } > - > - if (bprm->have_execfd) { > - nr = 0; > - csp -= 2 * sizeof(unsigned long); > - NEW_AUX_ENT(AT_EXECFD, bprm->execfd); > - } > - > - nr = 0; > - csp -= DLINFO_ITEMS * 2 * sizeof(unsigned long); > +#ifdef ARCH_DLINFO > + /* > + * ARCH_DLINFO must come first so PPC can do its special alignment of > + * AUXV. > + * update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT() in > + * ARCH_DLINFO changes > + */ > + ARCH_DLINFO; > +#endif > NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); > #ifdef ELF_HWCAP2 > NEW_AUX_ENT(AT_HWCAP2, ELF_HWCAP2); > @@ -659,17 +640,32 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, > NEW_AUX_ENT(AT_EGID, (elf_addr_t) from_kgid_munged(cred->user_ns, cred->egid)); > NEW_AUX_ENT(AT_SECURE, bprm->secureexec); > NEW_AUX_ENT(AT_EXECFN, bprm->exec); > + if (k_platform) { > + NEW_AUX_ENT(AT_PLATFORM, > + (elf_addr_t)(unsigned long)u_platform); > + } > + if (k_base_platform) { > + NEW_AUX_ENT(AT_BASE_PLATFORM, > + (elf_addr_t)(unsigned long)u_base_platform); > + } > + if (bprm->have_execfd) { > + NEW_AUX_ENT(AT_EXECFD, bprm->execfd); > + } > +#undef NEW_AUX_ENT > + /* AT_NULL is zero; clear the rest too */ > + memset(elf_info, 0, (char *)mm->saved_auxv + > + sizeof(mm->saved_auxv) - (char *)elf_info); > > -#ifdef ARCH_DLINFO > - nr = 0; > - csp -= AT_VECTOR_SIZE_ARCH * 2 * sizeof(unsigned long); > + /* And advance past the AT_NULL entry. */ > + elf_info += 2; > > - /* ARCH_DLINFO must come last so platform specific code can enforce > - * special alignment requirements on the AUXV if necessary (eg. PPC). > - */ > - ARCH_DLINFO; > -#endif > -#undef NEW_AUX_ENT > + ei_index = elf_info - (elf_addr_t *)mm->saved_auxv; > + csp -= ei_index * sizeof(elf_addr_t); > + > + /* Put the elf_info on the stack in the right place. */ > + if (copy_to_user((void __user *)csp, mm->saved_auxv, > + ei_index * sizeof(elf_addr_t))) > + return -EFAULT; > > /* allocate room for argv[] and envv[] */ > csp -= (bprm->envc + 1) * sizeof(elf_caddr_t);
Hi Greg, On Sun, Aug 11, 2024 at 7:26 PM Greg Ungerer <gregungerer@westnet.com.au> wrote: > On 23/3/24 05:54, Max Filippov wrote: > > Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are > > empty because there's no code that initializes mm->saved_auxv in the > > FDPIC ELF loader. > > > > Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry > > aux vector copying to userspace with initialization of mm->saved_auxv > > first and then copying it to userspace as a whole. > > > > Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> > > This is breaking ARM nommu builds supporting fdpic and elf binaries for me. > > Tests I have for m68k and riscv nommu setups running elf binaries > don't show any problems - I am only seeing this on ARM. > > > ... > Freeing unused kernel image (initmem) memory: 472K > This architecture does not have kernel memory protection. > Run /init as init process > Internal error: Oops - undefined instruction: 0 [#1] ARM > Modules linked in: > CPU: 0 PID: 1 Comm: init Not tainted 6.10.0 #1 > Hardware name: ARM-Versatile (Device Tree Support) > PC is at load_elf_fdpic_binary+0xb34/0xb80 > LR is at 0x0 > pc : [<00109ce8>] lr : [<00000000>] psr: 80000153 > sp : 00823e40 ip : 00000000 fp : 00b8fee4 > r10: 009c9b80 r9 : 00b8ff80 r8 : 009ee800 > r7 : 00000000 r6 : 009f7e80 r5 : 00b8fedc r4 : 00b87000 > r3 : 00b8fed8 r2 : 00b8fee0 r1 : 00b87128 r0 : 00b8fef0 > Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none > Control: 00091176 Table: 00000000 DAC: 00000000 > Register r0 information: non-slab/vmalloc memory > Register r1 information: slab/vmalloc mm_struct start 00b87000 pointer offset 296 size 428 > Register r2 information: non-slab/vmalloc memory > Register r3 information: non-slab/vmalloc memory > Register r4 information: slab/vmalloc mm_struct start 00b87000 pointer offset 0 size 428 > Register r5 information: non-slab/vmalloc memory > Register r6 information: slab/vmalloc kmalloc-32 start 009f7e80 pointer offset 0 size 32 > Register r7 information: non-slab/vmalloc memory > Register r8 information: slab/vmalloc kmalloc-512 start 009ee800 pointer offset 0 size 512 > Register r9 information: non-slab/vmalloc memory > Register r10 information: slab/vmalloc kmalloc-128 start 009c9b80 pointer offset 0 size 128 > Register r11 information: non-slab/vmalloc memory > Register r12 information: non-slab/vmalloc memory > Process init (pid: 1, stack limit = 0x(ptrval)) > Stack: (0x00823e40 to 0x00824000) > 3e40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3e60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3e80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3ea0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3ec0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3ee0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3f00: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3f20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3f40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3f60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3f80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3fa0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > Call trace: > load_elf_fdpic_binary from bprm_execve+0x1b4/0x488 > bprm_execve from kernel_execve+0x154/0x1e4 > kernel_execve from kernel_init+0x4c/0x108 > kernel_init from ret_from_fork+0x14/0x38 > Exception stack(0x00823fb0 to 0x00823ff8) > 3fa0: ???????? ???????? ???????? ???????? > 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? > Code: bad PC value > ---[ end trace 0000000000000000 ]--- > note: init[1] exited with irqs disabled > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- > > > The code around that PC is: > > 109cd0: e2833ff1 add r3, r3, #964 @ 0x3c4 > 109cd4: e5933000 ldr r3, [r3] > 109cd8: e5933328 ldr r3, [r3, #808] @ 0x328 > 109cdc: e5933084 ldr r3, [r3, #132] @ 0x84 > 109ce0: e5843034 str r3, [r4, #52] @ 0x34 > 109ce4: eafffdbc b 1093dc <load_elf_fdpic_binary+0x228> > 109ce8: e7f001f2 .word 0xe7f001f2 > 109cec: eb09471d bl 35b968 <__stack_chk_fail> > 109cf0: e59f0038 ldr r0, [pc, #56] @ 109d30 <load_elf_fdpic_binary+0xb7c> > 109cf4: eb092f03 bl 355908 <_printk> > 109cf8: eafffdb7 b 1093dc <load_elf_fdpic_binary+0x228> > > > Reverting just this change gets it working again. > > Any idea what might be going on? Other than that the layout of the aux vector has slightly changed I don't see anything. I can take a look at what's going on if you can share the reproducer.
Hi Max, On 13/8/24 04:02, Max Filippov wrote: > Hi Greg, > > On Sun, Aug 11, 2024 at 7:26 PM Greg Ungerer <gregungerer@westnet.com.au> wrote: >> On 23/3/24 05:54, Max Filippov wrote: >>> Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are >>> empty because there's no code that initializes mm->saved_auxv in the >>> FDPIC ELF loader. >>> >>> Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry >>> aux vector copying to userspace with initialization of mm->saved_auxv >>> first and then copying it to userspace as a whole. >>> >>> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> >> >> This is breaking ARM nommu builds supporting fdpic and elf binaries for me. >> >> Tests I have for m68k and riscv nommu setups running elf binaries >> don't show any problems - I am only seeing this on ARM. >> >> >> ... >> Freeing unused kernel image (initmem) memory: 472K >> This architecture does not have kernel memory protection. >> Run /init as init process >> Internal error: Oops - undefined instruction: 0 [#1] ARM >> Modules linked in: >> CPU: 0 PID: 1 Comm: init Not tainted 6.10.0 #1 >> Hardware name: ARM-Versatile (Device Tree Support) >> PC is at load_elf_fdpic_binary+0xb34/0xb80 >> LR is at 0x0 >> pc : [<00109ce8>] lr : [<00000000>] psr: 80000153 >> sp : 00823e40 ip : 00000000 fp : 00b8fee4 >> r10: 009c9b80 r9 : 00b8ff80 r8 : 009ee800 >> r7 : 00000000 r6 : 009f7e80 r5 : 00b8fedc r4 : 00b87000 >> r3 : 00b8fed8 r2 : 00b8fee0 r1 : 00b87128 r0 : 00b8fef0 >> Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none >> Control: 00091176 Table: 00000000 DAC: 00000000 >> Register r0 information: non-slab/vmalloc memory >> Register r1 information: slab/vmalloc mm_struct start 00b87000 pointer offset 296 size 428 >> Register r2 information: non-slab/vmalloc memory >> Register r3 information: non-slab/vmalloc memory >> Register r4 information: slab/vmalloc mm_struct start 00b87000 pointer offset 0 size 428 >> Register r5 information: non-slab/vmalloc memory >> Register r6 information: slab/vmalloc kmalloc-32 start 009f7e80 pointer offset 0 size 32 >> Register r7 information: non-slab/vmalloc memory >> Register r8 information: slab/vmalloc kmalloc-512 start 009ee800 pointer offset 0 size 512 >> Register r9 information: non-slab/vmalloc memory >> Register r10 information: slab/vmalloc kmalloc-128 start 009c9b80 pointer offset 0 size 128 >> Register r11 information: non-slab/vmalloc memory >> Register r12 information: non-slab/vmalloc memory >> Process init (pid: 1, stack limit = 0x(ptrval)) >> Stack: (0x00823e40 to 0x00824000) >> 3e40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3e60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3e80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3ea0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3ec0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3ee0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3f00: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3f20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3f40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3f60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3f80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3fa0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> Call trace: >> load_elf_fdpic_binary from bprm_execve+0x1b4/0x488 >> bprm_execve from kernel_execve+0x154/0x1e4 >> kernel_execve from kernel_init+0x4c/0x108 >> kernel_init from ret_from_fork+0x14/0x38 >> Exception stack(0x00823fb0 to 0x00823ff8) >> 3fa0: ???????? ???????? ???????? ???????? >> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? >> 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? >> Code: bad PC value >> ---[ end trace 0000000000000000 ]--- >> note: init[1] exited with irqs disabled >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b >> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- >> >> >> The code around that PC is: >> >> 109cd0: e2833ff1 add r3, r3, #964 @ 0x3c4 >> 109cd4: e5933000 ldr r3, [r3] >> 109cd8: e5933328 ldr r3, [r3, #808] @ 0x328 >> 109cdc: e5933084 ldr r3, [r3, #132] @ 0x84 >> 109ce0: e5843034 str r3, [r4, #52] @ 0x34 >> 109ce4: eafffdbc b 1093dc <load_elf_fdpic_binary+0x228> >> 109ce8: e7f001f2 .word 0xe7f001f2 >> 109cec: eb09471d bl 35b968 <__stack_chk_fail> >> 109cf0: e59f0038 ldr r0, [pc, #56] @ 109d30 <load_elf_fdpic_binary+0xb7c> >> 109cf4: eb092f03 bl 355908 <_printk> >> 109cf8: eafffdb7 b 1093dc <load_elf_fdpic_binary+0x228> >> >> >> Reverting just this change gets it working again. >> >> Any idea what might be going on? > > Other than that the layout of the aux vector has slightly changed I don't > see anything. I can take a look at what's going on if you can share the > reproducer. I build the test binary using a simple script: https://github.com/gregungerer/simple-linux/blob/master/build-armnommu-linux-uclibc-fdpic.sh Run the resulting vmlinux and devicetree under qemu as per comments at top of that script. Note that that script has a revert of this change (at line 191), so you would need to take that out to reproduce the problem. This script will look for a couple of configs and a versatile patch: https://github.com/gregungerer/simple-linux/blob/master/configs/linux-6.10-armnommu-versatile.config https://github.com/gregungerer/simple-linux/blob/master/configs/uClibc-ng-1.0.49-armnommu-fdpic.config https://github.com/gregungerer/simple-linux/blob/master/configs/busybox-1.36.1.config https://github.com/gregungerer/simple-linux/blob/master/configs/rootfs.dev https://github.com/gregungerer/simple-linux/blob/master/patches/linux-6.10-armnommu-versatile.patch https://github.com/gregungerer/simple-linux/blob/master/patches/linux-6.10-binfmt_elf_fdpic-fix-proc-pid-auxv.patch Or you could just clone the whole small set in one step from: https://github.com/gregungerer/simple-linux.git If you need to me to run something I can do that easily too. Regards Greg
On Mon, Aug 12, 2024 at 9:53 PM Greg Ungerer <gregungerer@westnet.com.au> wrote: > On 13/8/24 04:02, Max Filippov wrote: > > On Sun, Aug 11, 2024 at 7:26 PM Greg Ungerer <gregungerer@westnet.com.au> wrote: > >> On 23/3/24 05:54, Max Filippov wrote: > >>> Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are > >>> empty because there's no code that initializes mm->saved_auxv in the > >>> FDPIC ELF loader. > >>> > >>> Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry > >>> aux vector copying to userspace with initialization of mm->saved_auxv > >>> first and then copying it to userspace as a whole. > >>> > >>> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> > >> > >> This is breaking ARM nommu builds supporting fdpic and elf binaries for me. > >> > >> Tests I have for m68k and riscv nommu setups running elf binaries > >> don't show any problems - I am only seeing this on ARM. > >> > >> > >> ... > >> Freeing unused kernel image (initmem) memory: 472K > >> This architecture does not have kernel memory protection. > >> Run /init as init process > >> Internal error: Oops - undefined instruction: 0 [#1] ARM > >> Modules linked in: > >> CPU: 0 PID: 1 Comm: init Not tainted 6.10.0 #1 > >> Hardware name: ARM-Versatile (Device Tree Support) > >> PC is at load_elf_fdpic_binary+0xb34/0xb80 > >> LR is at 0x0 > >> pc : [<00109ce8>] lr : [<00000000>] psr: 80000153 > >> sp : 00823e40 ip : 00000000 fp : 00b8fee4 > >> r10: 009c9b80 r9 : 00b8ff80 r8 : 009ee800 > >> r7 : 00000000 r6 : 009f7e80 r5 : 00b8fedc r4 : 00b87000 > >> r3 : 00b8fed8 r2 : 00b8fee0 r1 : 00b87128 r0 : 00b8fef0 > >> Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none > >> Control: 00091176 Table: 00000000 DAC: 00000000 > >> Register r0 information: non-slab/vmalloc memory > >> Register r1 information: slab/vmalloc mm_struct start 00b87000 pointer offset 296 size 428 > >> Register r2 information: non-slab/vmalloc memory > >> Register r3 information: non-slab/vmalloc memory > >> Register r4 information: slab/vmalloc mm_struct start 00b87000 pointer offset 0 size 428 > >> Register r5 information: non-slab/vmalloc memory > >> Register r6 information: slab/vmalloc kmalloc-32 start 009f7e80 pointer offset 0 size 32 > >> Register r7 information: non-slab/vmalloc memory > >> Register r8 information: slab/vmalloc kmalloc-512 start 009ee800 pointer offset 0 size 512 > >> Register r9 information: non-slab/vmalloc memory > >> Register r10 information: slab/vmalloc kmalloc-128 start 009c9b80 pointer offset 0 size 128 > >> Register r11 information: non-slab/vmalloc memory > >> Register r12 information: non-slab/vmalloc memory > >> Process init (pid: 1, stack limit = 0x(ptrval)) > >> Stack: (0x00823e40 to 0x00824000) > >> 3e40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3e60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3e80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3ea0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3ec0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3ee0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3f00: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3f20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3f40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3f60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3f80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3fa0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> Call trace: > >> load_elf_fdpic_binary from bprm_execve+0x1b4/0x488 > >> bprm_execve from kernel_execve+0x154/0x1e4 > >> kernel_execve from kernel_init+0x4c/0x108 > >> kernel_init from ret_from_fork+0x14/0x38 > >> Exception stack(0x00823fb0 to 0x00823ff8) > >> 3fa0: ???????? ???????? ???????? ???????? > >> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? > >> 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? > >> Code: bad PC value > >> ---[ end trace 0000000000000000 ]--- > >> note: init[1] exited with irqs disabled > >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > >> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- > >> > >> > >> The code around that PC is: > >> > >> 109cd0: e2833ff1 add r3, r3, #964 @ 0x3c4 > >> 109cd4: e5933000 ldr r3, [r3] > >> 109cd8: e5933328 ldr r3, [r3, #808] @ 0x328 > >> 109cdc: e5933084 ldr r3, [r3, #132] @ 0x84 > >> 109ce0: e5843034 str r3, [r4, #52] @ 0x34 > >> 109ce4: eafffdbc b 1093dc <load_elf_fdpic_binary+0x228> > >> 109ce8: e7f001f2 .word 0xe7f001f2 > >> 109cec: eb09471d bl 35b968 <__stack_chk_fail> > >> 109cf0: e59f0038 ldr r0, [pc, #56] @ 109d30 <load_elf_fdpic_binary+0xb7c> > >> 109cf4: eb092f03 bl 355908 <_printk> > >> 109cf8: eafffdb7 b 1093dc <load_elf_fdpic_binary+0x228> > >> > >> > >> Reverting just this change gets it working again. > >> > >> Any idea what might be going on? > > > > Other than that the layout of the aux vector has slightly changed I don't > > see anything. I can take a look at what's going on if you can share the > > reproducer. > > I build the test binary using a simple script: > > https://github.com/gregungerer/simple-linux/blob/master/build-armnommu-linux-uclibc-fdpic.sh > > Run the resulting vmlinux and devicetree under qemu as per comments at top of that script. Cool scripts, thanks. I was able to reproduce the issue, it's the BUG_ON(csp != sp); from the create_elf_fdpic_tables() that causes it. I'm still trying to figure out how that happens.
Hi Greg, On Tue, Aug 13, 2024 at 11:28 AM Max Filippov <jcmvbkbc@gmail.com> wrote: > On Mon, Aug 12, 2024 at 9:53 PM Greg Ungerer <gregungerer@westnet.com.au> wrote: > > On 13/8/24 04:02, Max Filippov wrote: > > > On Sun, Aug 11, 2024 at 7:26 PM Greg Ungerer <gregungerer@westnet.com.au> wrote: > > >> On 23/3/24 05:54, Max Filippov wrote: > > >>> Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are > > >>> empty because there's no code that initializes mm->saved_auxv in the > > >>> FDPIC ELF loader. > > >>> > > >>> Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry > > >>> aux vector copying to userspace with initialization of mm->saved_auxv > > >>> first and then copying it to userspace as a whole. > > >>> > > >>> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> > > >> > > >> This is breaking ARM nommu builds supporting fdpic and elf binaries for me. > > >> > > >> Tests I have for m68k and riscv nommu setups running elf binaries > > >> don't show any problems - I am only seeing this on ARM. I see the following: - the issue with the change is caused by unaccouncounted AUX vector entry AT_HWCAP2 that is defined for ARM, but not for any other architecture that you tested. - in the original code this off-by-one error resulted in the last entry of the AUX vector being set to zero. Below are the stack dumps from the ARM kernels built by your script, one with my change (left) and the other where this change is reverted (right): argc: 00000001 00000001 argv: 00b8ffde 00b8ffde 00000000 00000000 envp: 00b8ffe4 00b8ffe4 00b8ffeb 00b8ffeb 00000000 00000000 auxv entries: 00000010 00000010 000001d7 000001d7 0000001a 0000001a 00000000 00000000 00000006 00000006 00001000 00001000 00000011 00000011 00000064 00000064 00000003 00000003 00980034 00a00034 00000004 00000004 00000020 00000020 00000005 00000005 00000007 00000007 00000007 00000007 00a40000 00a40000 00000008 00000008 00000000 00000000 00000009 00000009 00984040 00a04040 0000000b 0000000b 00000000 00000000 0000000c 0000000c 00000000 00000000 0000000d 0000000d 00000000 00000000 0000000e 0000000e 00000000 00000000 00000017 00000017 00000000 00000000 0000001f 0000001f 00b8fff6 00b8fff6 0000000f 00000000 00b8ffcc 00000000 00000000 00000000 00000000 00000000 The fix is in correct accounting of space for the AT_HWCAP2 entry.
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index fefc642541cb..7b4542a0cbe3 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -505,8 +505,9 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, char *k_platform, *k_base_platform; char __user *u_platform, *u_base_platform, *p; int loop; - int nr; /* reset for each csp adjustment */ unsigned long flags = 0; + int ei_index; + elf_addr_t *elf_info; #ifdef CONFIG_MMU /* In some cases (e.g. Hyper-Threading), we want to avoid L1 evictions @@ -601,44 +602,24 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, csp -= sp & 15UL; sp -= sp & 15UL; - /* put the ELF interpreter info on the stack */ -#define NEW_AUX_ENT(id, val) \ - do { \ - struct { unsigned long _id, _val; } __user *ent, v; \ - \ - ent = (void __user *) csp; \ - v._id = (id); \ - v._val = (val); \ - if (copy_to_user(ent + nr, &v, sizeof(v))) \ - return -EFAULT; \ - nr++; \ + /* Create the ELF interpreter info */ + elf_info = (elf_addr_t *)mm->saved_auxv; + /* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */ +#define NEW_AUX_ENT(id, val) \ + do { \ + *elf_info++ = id; \ + *elf_info++ = val; \ } while (0) - nr = 0; - csp -= 2 * sizeof(unsigned long); - NEW_AUX_ENT(AT_NULL, 0); - if (k_platform) { - nr = 0; - csp -= 2 * sizeof(unsigned long); - NEW_AUX_ENT(AT_PLATFORM, - (elf_addr_t) (unsigned long) u_platform); - } - - if (k_base_platform) { - nr = 0; - csp -= 2 * sizeof(unsigned long); - NEW_AUX_ENT(AT_BASE_PLATFORM, - (elf_addr_t) (unsigned long) u_base_platform); - } - - if (bprm->have_execfd) { - nr = 0; - csp -= 2 * sizeof(unsigned long); - NEW_AUX_ENT(AT_EXECFD, bprm->execfd); - } - - nr = 0; - csp -= DLINFO_ITEMS * 2 * sizeof(unsigned long); +#ifdef ARCH_DLINFO + /* + * ARCH_DLINFO must come first so PPC can do its special alignment of + * AUXV. + * update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT() in + * ARCH_DLINFO changes + */ + ARCH_DLINFO; +#endif NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP); #ifdef ELF_HWCAP2 NEW_AUX_ENT(AT_HWCAP2, ELF_HWCAP2); @@ -659,17 +640,32 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, NEW_AUX_ENT(AT_EGID, (elf_addr_t) from_kgid_munged(cred->user_ns, cred->egid)); NEW_AUX_ENT(AT_SECURE, bprm->secureexec); NEW_AUX_ENT(AT_EXECFN, bprm->exec); + if (k_platform) { + NEW_AUX_ENT(AT_PLATFORM, + (elf_addr_t)(unsigned long)u_platform); + } + if (k_base_platform) { + NEW_AUX_ENT(AT_BASE_PLATFORM, + (elf_addr_t)(unsigned long)u_base_platform); + } + if (bprm->have_execfd) { + NEW_AUX_ENT(AT_EXECFD, bprm->execfd); + } +#undef NEW_AUX_ENT + /* AT_NULL is zero; clear the rest too */ + memset(elf_info, 0, (char *)mm->saved_auxv + + sizeof(mm->saved_auxv) - (char *)elf_info); -#ifdef ARCH_DLINFO - nr = 0; - csp -= AT_VECTOR_SIZE_ARCH * 2 * sizeof(unsigned long); + /* And advance past the AT_NULL entry. */ + elf_info += 2; - /* ARCH_DLINFO must come last so platform specific code can enforce - * special alignment requirements on the AUXV if necessary (eg. PPC). - */ - ARCH_DLINFO; -#endif -#undef NEW_AUX_ENT + ei_index = elf_info - (elf_addr_t *)mm->saved_auxv; + csp -= ei_index * sizeof(elf_addr_t); + + /* Put the elf_info on the stack in the right place. */ + if (copy_to_user((void __user *)csp, mm->saved_auxv, + ei_index * sizeof(elf_addr_t))) + return -EFAULT; /* allocate room for argv[] and envv[] */ csp -= (bprm->envc + 1) * sizeof(elf_caddr_t);
Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are empty because there's no code that initializes mm->saved_auxv in the FDPIC ELF loader. Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry aux vector copying to userspace with initialization of mm->saved_auxv first and then copying it to userspace as a whole. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> --- fs/binfmt_elf_fdpic.c | 88 +++++++++++++++++++++---------------------- 1 file changed, 42 insertions(+), 46 deletions(-)