Message ID | 20200429214954.44866-2-jannh@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix ELF / FDPIC ELF core dumping, and use mmap_sem properly in there | expand |
On Wed, Apr 29, 2020 at 11:49:50PM +0200, Jann Horn wrote: > dump_emit() is for kernel pointers, and VMAs describe userspace memory. > Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS, > even if it probably doesn't matter much on !MMU systems - especially given > that it looks like we can just use the same get_dump_page() as on MMU if > we move it out of the CONFIG_MMU block. Looks sensible. Did you get a chance to test this with a nommu setup?
On Tue, May 5, 2020 at 12:48 PM Christoph Hellwig <hch@lst.de> wrote: > On Wed, Apr 29, 2020 at 11:49:50PM +0200, Jann Horn wrote: > > dump_emit() is for kernel pointers, and VMAs describe userspace memory. > > Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS, > > even if it probably doesn't matter much on !MMU systems - especially given > > that it looks like we can just use the same get_dump_page() as on MMU if > > we move it out of the CONFIG_MMU block. > > Looks sensible. Did you get a chance to test this with a nommu setup? Nope. Do you happen to have a recommendation for a convenient environment I can use with QEMU, or something like that? I'm guessing that just running a standard armel Debian userspace with a !mmu ARM kernel wouldn't work so well?
On Tue, May 05, 2020 at 01:42:12PM +0200, Jann Horn wrote: > On Tue, May 5, 2020 at 12:48 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Apr 29, 2020 at 11:49:50PM +0200, Jann Horn wrote: > > > dump_emit() is for kernel pointers, and VMAs describe userspace memory. > > > Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS, > > > even if it probably doesn't matter much on !MMU systems - especially given > > > that it looks like we can just use the same get_dump_page() as on MMU if > > > we move it out of the CONFIG_MMU block. > > > > Looks sensible. Did you get a chance to test this with a nommu setup? > > Nope. Do you happen to have a recommendation for a convenient > environment I can use with QEMU, or something like that? I'm guessing > that just running a standard armel Debian userspace with a !mmu ARM > kernel wouldn't work so well? Nommu generally needs special userspace either using uclibc-ng or musl. When I did the RISC-V nommu work I used buildroot for my root file systems. We haven't gotten elffdpic to work on RISC-V yet, so I can't use that setup for testing, but it should support ARM as well.
On Tue, May 5, 2020 at 2:15 PM Christoph Hellwig <hch@lst.de> wrote: > On Tue, May 05, 2020 at 01:42:12PM +0200, Jann Horn wrote: > > On Tue, May 5, 2020 at 12:48 PM Christoph Hellwig <hch@lst.de> wrote: > > > On Wed, Apr 29, 2020 at 11:49:50PM +0200, Jann Horn wrote: > > > > dump_emit() is for kernel pointers, and VMAs describe userspace memory. > > > > Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS, > > > > even if it probably doesn't matter much on !MMU systems - especially given > > > > that it looks like we can just use the same get_dump_page() as on MMU if > > > > we move it out of the CONFIG_MMU block. > > > > > > Looks sensible. Did you get a chance to test this with a nommu setup? > > > > Nope. Do you happen to have a recommendation for a convenient > > environment I can use with QEMU, or something like that? I'm guessing > > that just running a standard armel Debian userspace with a !mmu ARM > > kernel wouldn't work so well? > > Nommu generally needs special userspace either using uclibc-ng or musl. > When I did the RISC-V nommu work I used buildroot for my root file > systems. We haven't gotten elffdpic to work on RISC-V yet, so I can't > use that setup for testing, but it should support ARM as well. I've finally gotten around to testing this, and discovered that I actually had to change something in the patch - thanks for asking me to test this. Some notes on running ARM nommu testing: I ended up running QEMU with "-machine versatilepb". To make that work, I applied this patch: <https://github.com/buildroot/buildroot/blob/master/board/qemu/arm-versatile/patches/linux/versatile-nommu.patch> A couple of directories up, there are also a README and a kernel config for that. Note that the emulated harddrive of this board doesn't seem to work, because it's connected via PCI, and nommu generally can't use PCI; but you can boot from initramfs, and you can copy files from/to the host with netcat, since the emulated network card does work. (To avoid having to bring up the interface from userspace, you can use "ip=10.0.2.1::10.0.2.2:255.255.255.0" on the kernel cmdline if the corresponding feature is enabled in the kernel config.) The first trouble I ran into with trying to run FDPIC userspace (based on musl) was that Linux has support for running ARM userspace in "26-bit mode", which is some ARM feature from the dark ages, with no support in QEMU; and while normally Linux only tries to enable that thing when the binary explicitly requires it, the FDPIC path isn't wired up to the appropriate personality logic properly, and so you get a spectacular explosion, where eventually the kernel oopses with a message about how it's trying to load an invalid value into CPSR because first the kernel tries to return to 26-bit mode, and then, through some mysterious spooky action at a distance, the kernel (AFAICS) ends up trying to do a syscall return with the stack pointer pointing somewhere in the middle of the kernel stack (and not where the entry register frame is). Anyway, my hacky workaround for that is: diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index b9241051e5cb..d5aa409e366c 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -70,7 +70,7 @@ static inline void arch_thread_struct_whitelist(unsigned long *offset, if (current->personality & ADDR_LIMIT_32BIT) \ regs->ARM_cpsr = USR_MODE; \ else \ - regs->ARM_cpsr = USR26_MODE; \ + { WARN(1, "setting USR26_MODE"); regs->ARM_cpsr = USR_MODE; } \ if (elf_hwcap & HWCAP_THUMB && pc & 1) \ regs->ARM_cpsr |= PSR_T_BIT; \ regs->ARM_cpsr |= PSR_ENDSTATE; \ Next up: Early on in the libc startup code, musl aborts execution by intentionally executing an undefined instruction in __set_thread_area(), because it can't figure out any working implementation of atomic cmpxchg. For the MMU case, there is a kuser helper (what x86 would call vsyscall); but for NOMMU ARM, no working implementation exists. So I gave up on musl and went with uclibc-ng (built via buildroot) instead, since uclibc-ng has support for compiling out thread support. Annoyingly, buildroot doesn't support FDPIC (at least not for nommu ARM). So I ended up telling it to build a small FLAT userspace, and used a standard ARM toolchain to build a tiny static PIE ELF binary with no reliance on libc (the FDPIC loader can actually load normal ELF mostly fine as long as it's PIE, at the cost of having to duplicate the text section for every instance) - luckily I didn't need the ELF binary to actually do anything complicated, and so working without any libc was tolerable: arm-linux-gnueabi-gcc-10 -fPIC -c -o test_crash.o test_crash.c arm-linux-gnueabi-ld -pie --no-dynamic-linker -o test_crash test_crash.o Next fun part: gdb-multiarch doesn't seem to be able to open FDPIC core dumps properly - none of the register status is available. I took apart the core dump before and after the patch in a hex editor, though, and it seems to have all the expected stuff in it. I'm guessing that maybe GDB got thrown off by struct elf_prstatus having a different layout if the core dump was generated on nommu? GDB's elf32_arm_nabi_grok_prstatus() seems to only handle the struct size for the non-FDPIC struct.
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index c62c17a5c34a9..f5b47076fa762 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -1495,14 +1495,11 @@ static bool elf_fdpic_dump_segments(struct coredump_params *cprm) struct vm_area_struct *vma; for (vma = current->mm->mmap; vma; vma = vma->vm_next) { -#ifdef CONFIG_MMU unsigned long addr; -#endif if (!maydump(vma, cprm->mm_flags)) continue; -#ifdef CONFIG_MMU for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { bool res; @@ -1518,11 +1515,6 @@ static bool elf_fdpic_dump_segments(struct coredump_params *cprm) if (!res) return false; } -#else - if (!dump_emit(cprm, (void *) vma->vm_start, - vma->vm_end - vma->vm_start)) - return false; -#endif } return true; } diff --git a/mm/gup.c b/mm/gup.c index 50681f0286ded..76080c4dbff05 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1490,35 +1490,6 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) up_read(&mm->mmap_sem); return ret; /* 0 or negative error code */ } - -/** - * get_dump_page() - pin user page in memory while writing it to core dump - * @addr: user address - * - * Returns struct page pointer of user page pinned for dump, - * to be freed afterwards by put_page(). - * - * Returns NULL on any kind of failure - a hole must then be inserted into - * the corefile, to preserve alignment with its headers; and also returns - * NULL wherever the ZERO_PAGE, or an anonymous pte_none, has been found - - * allowing a hole to be left in the corefile to save diskspace. - * - * Called without mmap_sem, but after all other threads have been killed. - */ -#ifdef CONFIG_ELF_CORE -struct page *get_dump_page(unsigned long addr) -{ - struct vm_area_struct *vma; - struct page *page; - - if (__get_user_pages(current, current->mm, addr, 1, - FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma, - NULL) < 1) - return NULL; - flush_cache_page(vma, addr, page_to_pfn(page)); - return page; -} -#endif /* CONFIG_ELF_CORE */ #else /* CONFIG_MMU */ static long __get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, @@ -1565,6 +1536,35 @@ static long __get_user_pages_locked(struct task_struct *tsk, } #endif /* !CONFIG_MMU */ +/** + * get_dump_page() - pin user page in memory while writing it to core dump + * @addr: user address + * + * Returns struct page pointer of user page pinned for dump, + * to be freed afterwards by put_page(). + * + * Returns NULL on any kind of failure - a hole must then be inserted into + * the corefile, to preserve alignment with its headers; and also returns + * NULL wherever the ZERO_PAGE, or an anonymous pte_none, has been found - + * allowing a hole to be left in the corefile to save diskspace. + * + * Called without mmap_sem, but after all other threads have been killed. + */ +#ifdef CONFIG_ELF_CORE +struct page *get_dump_page(unsigned long addr) +{ + struct vm_area_struct *vma; + struct page *page; + + if (__get_user_pages(current, current->mm, addr, 1, + FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma, + NULL) < 1) + return NULL; + flush_cache_page(vma, addr, page_to_pfn(page)); + return page; +} +#endif /* CONFIG_ELF_CORE */ + #if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA) static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages) {
dump_emit() is for kernel pointers, and VMAs describe userspace memory. Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS, even if it probably doesn't matter much on !MMU systems - especially given that it looks like we can just use the same get_dump_page() as on MMU if we move it out of the CONFIG_MMU block. Signed-off-by: Jann Horn <jannh@google.com> --- fs/binfmt_elf_fdpic.c | 8 ------ mm/gup.c | 58 +++++++++++++++++++++---------------------- 2 files changed, 29 insertions(+), 37 deletions(-)