Message ID | 20201002171921.3053-1-toiwoton@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: optionally disable brk() | expand |
On 02.10.20 19:19, Topi Miettinen wrote: > The brk() system call allows to change data segment size (heap). This > is mainly used by glibc for memory allocation, but it can use mmap() > and that results in more randomized memory mappings since the heap is > always located at fixed offset to program while mmap()ed memory is > randomized. Want to take more Unix out of Linux? Honestly, why care about disabling? User space can happily use mmap() if it prefers.
From: David Hildenbrand > Sent: 02 October 2020 18:52 > > On 02.10.20 19:19, Topi Miettinen wrote: > > The brk() system call allows to change data segment size (heap). This > > is mainly used by glibc for memory allocation, but it can use mmap() > > and that results in more randomized memory mappings since the heap is > > always located at fixed offset to program while mmap()ed memory is > > randomized. > > Want to take more Unix out of Linux? > > Honestly, why care about disabling? User space can happily use mmap() if > it prefers. I bet some obscure applications rely on it. Although hopefully nothing still does heap allocation by just increasing the VA and calling brk() in response to SIGSEGV. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 2.10.2020 20.52, David Hildenbrand wrote: > On 02.10.20 19:19, Topi Miettinen wrote: >> The brk() system call allows to change data segment size (heap). This >> is mainly used by glibc for memory allocation, but it can use mmap() >> and that results in more randomized memory mappings since the heap is >> always located at fixed offset to program while mmap()ed memory is >> randomized. > > Want to take more Unix out of Linux? > > Honestly, why care about disabling? User space can happily use mmap() if > it prefers. brk() interface doesn't seem to be used much and glibc is happy to switch to mmap() if brk() fails, so why not allow disabling it optionally? If you don't care to disable, don't do it and this is even the default. -Topi
On Sat 03-10-20 00:44:09, Topi Miettinen wrote: > On 2.10.2020 20.52, David Hildenbrand wrote: > > On 02.10.20 19:19, Topi Miettinen wrote: > > > The brk() system call allows to change data segment size (heap). This > > > is mainly used by glibc for memory allocation, but it can use mmap() > > > and that results in more randomized memory mappings since the heap is > > > always located at fixed offset to program while mmap()ed memory is > > > randomized. > > > > Want to take more Unix out of Linux? > > > > Honestly, why care about disabling? User space can happily use mmap() if > > it prefers. > > brk() interface doesn't seem to be used much and glibc is happy to switch to > mmap() if brk() fails, so why not allow disabling it optionally? If you > don't care to disable, don't do it and this is even the default. I do not think we want to have config per syscall, do we? There are many other syscalls which are rarely used. Your changelog is actually missing the most important part. Why do we care so much to increase the config space and make the kerneel even more tricky for users to configure? How do I know that something won't break? brk() is one of those syscalls that has been here for ever and a lot of userspace might depend on it. I haven't checked but the code size is very unlikely to be shrunk much as this is mostly a tiny wrapper around mmap code. We are not going to get rid of any complexity. So what is the point?
On 5.10.2020 9.12, Michal Hocko wrote: > On Sat 03-10-20 00:44:09, Topi Miettinen wrote: >> On 2.10.2020 20.52, David Hildenbrand wrote: >>> On 02.10.20 19:19, Topi Miettinen wrote: >>>> The brk() system call allows to change data segment size (heap). This >>>> is mainly used by glibc for memory allocation, but it can use mmap() >>>> and that results in more randomized memory mappings since the heap is >>>> always located at fixed offset to program while mmap()ed memory is >>>> randomized. >>> >>> Want to take more Unix out of Linux? >>> >>> Honestly, why care about disabling? User space can happily use mmap() if >>> it prefers. >> >> brk() interface doesn't seem to be used much and glibc is happy to switch to >> mmap() if brk() fails, so why not allow disabling it optionally? If you >> don't care to disable, don't do it and this is even the default. > > I do not think we want to have config per syscall, do we? There are many > other syscalls which are rarely used. Your changelog is actually missing > the most important part. Why do we care so much to increase the config > space and make the kerneel even more tricky for users to configure? Maybe, I didn't know this was an important priority since there are other similar config options. Can you suggest some other config option which could trigger this? This option is already buried under CONFIG_EXPERT. > How > do I know that something won't break? brk() is one of those syscalls > that has been here for ever and a lot of userspace might depend on it. 1. brk() is used by glibc for malloc() as the primary choice, secondary to mmap(NULL, ...). But malloc() switches to using only mmap() as soon as brk() fails the first time, without breakage. 2. brk() also used for initializing glibc's internal thread structures. The only program I saw having problems was ldconfig which indeed segfaults due to an unsafe assumption that sbrk() will never fail. This is easily fixable by switching to an internal version of mmap(). 3. The dynamic loader uses brk() but this is only done to help malloc() and nothing breaks there if brk() returns ENOSYS. I've sent to glibc list RFC patches which switch to mmap() completely. This improves the randomization for malloc()ated memory and the location of the thread structures. > I haven't checked but the code size is very unlikely to be shrunk much > as this is mostly a tiny wrapper around mmap code. We are not going to > get rid of any complexity. > > So what is the point? The point is not to shrink the kernel (it will shrink by one small function) or get rid of complexity. The point is to disable an inferior interface. Memory returned by mmap() is at a random location but with brk() it is located near the data segment, so the address is more easily predictable. I think hardened, security oriented systems should disable brk() completely because it will increase the randomization of the process address space (ASLR). This wouldn't be a good option to enable for systems where maximum compatibility with legacy software is more important than any hardening. -Topi
On Mon 05-10-20 11:11:35, Topi Miettinen wrote: [...] > I think hardened, security oriented systems should disable brk() completely > because it will increase the randomization of the process address space > (ASLR). This wouldn't be a good option to enable for systems where maximum > compatibility with legacy software is more important than any hardening. I believe we already do have means to filter syscalls from userspace for security hardened environements. Or is there any reason to duplicate that and control during the configuration time?
On 5.10.2020 11.22, Michal Hocko wrote: > On Mon 05-10-20 11:11:35, Topi Miettinen wrote: > [...] >> I think hardened, security oriented systems should disable brk() completely >> because it will increase the randomization of the process address space >> (ASLR). This wouldn't be a good option to enable for systems where maximum >> compatibility with legacy software is more important than any hardening. > > I believe we already do have means to filter syscalls from userspace for > security hardened environements. Or is there any reason to duplicate > that and control during the configuration time? This is true, but seccomp can't be used for cases where NoNewPrivileges can't be enabled (setuid/setgid binaries present which sadly is still often the case even in otherwise hardened system), so it's typically not possible to install a filter for the whole system. -Topi
On 05.10.20 08:12, Michal Hocko wrote: > On Sat 03-10-20 00:44:09, Topi Miettinen wrote: >> On 2.10.2020 20.52, David Hildenbrand wrote: >>> On 02.10.20 19:19, Topi Miettinen wrote: >>>> The brk() system call allows to change data segment size (heap). This >>>> is mainly used by glibc for memory allocation, but it can use mmap() >>>> and that results in more randomized memory mappings since the heap is >>>> always located at fixed offset to program while mmap()ed memory is >>>> randomized. >>> >>> Want to take more Unix out of Linux? >>> >>> Honestly, why care about disabling? User space can happily use mmap() if >>> it prefers. >> >> brk() interface doesn't seem to be used much and glibc is happy to switch to >> mmap() if brk() fails, so why not allow disabling it optionally? If you >> don't care to disable, don't do it and this is even the default. > > I do not think we want to have config per syscall, do we? I do wonder if grouping would be a better option then (finding a proper level of abstraction ...).
On Mon 05-10-20 11:13:48, David Hildenbrand wrote: > On 05.10.20 08:12, Michal Hocko wrote: > > On Sat 03-10-20 00:44:09, Topi Miettinen wrote: > >> On 2.10.2020 20.52, David Hildenbrand wrote: > >>> On 02.10.20 19:19, Topi Miettinen wrote: > >>>> The brk() system call allows to change data segment size (heap). This > >>>> is mainly used by glibc for memory allocation, but it can use mmap() > >>>> and that results in more randomized memory mappings since the heap is > >>>> always located at fixed offset to program while mmap()ed memory is > >>>> randomized. > >>> > >>> Want to take more Unix out of Linux? > >>> > >>> Honestly, why care about disabling? User space can happily use mmap() if > >>> it prefers. > >> > >> brk() interface doesn't seem to be used much and glibc is happy to switch to > >> mmap() if brk() fails, so why not allow disabling it optionally? If you > >> don't care to disable, don't do it and this is even the default. > > > > I do not think we want to have config per syscall, do we? > > I do wonder if grouping would be a better option then (finding a proper > level of abstraction ...). I have a vague recollection that project for the kernel tinification was aiming that direction. No idea what is the current state or whether somebody is pursuing it.
On 5.10.2020 12.13, David Hildenbrand wrote: > On 05.10.20 08:12, Michal Hocko wrote: >> On Sat 03-10-20 00:44:09, Topi Miettinen wrote: >>> On 2.10.2020 20.52, David Hildenbrand wrote: >>>> On 02.10.20 19:19, Topi Miettinen wrote: >>>>> The brk() system call allows to change data segment size (heap). This >>>>> is mainly used by glibc for memory allocation, but it can use mmap() >>>>> and that results in more randomized memory mappings since the heap is >>>>> always located at fixed offset to program while mmap()ed memory is >>>>> randomized. >>>> >>>> Want to take more Unix out of Linux? >>>> >>>> Honestly, why care about disabling? User space can happily use mmap() if >>>> it prefers. >>> >>> brk() interface doesn't seem to be used much and glibc is happy to switch to >>> mmap() if brk() fails, so why not allow disabling it optionally? If you >>> don't care to disable, don't do it and this is even the default. >> >> I do not think we want to have config per syscall, do we? > > I do wonder if grouping would be a better option then (finding a proper > level of abstraction ...). If hardening and compatibility are seen as tradeoffs, perhaps there could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this. It would have options - "compatibility" (default) to gear questions for maximum compatibility, deselecting any hardening options which reduce compatibility - "hardening" to gear questions for maximum hardening, deselecting any compatibility options which reduce hardening - "none/manual": ask all questions like before -Topi
On 05.10.20 11:47, Topi Miettinen wrote: > On 5.10.2020 12.13, David Hildenbrand wrote: >> On 05.10.20 08:12, Michal Hocko wrote: >>> On Sat 03-10-20 00:44:09, Topi Miettinen wrote: >>>> On 2.10.2020 20.52, David Hildenbrand wrote: >>>>> On 02.10.20 19:19, Topi Miettinen wrote: >>>>>> The brk() system call allows to change data segment size (heap). This >>>>>> is mainly used by glibc for memory allocation, but it can use mmap() >>>>>> and that results in more randomized memory mappings since the heap is >>>>>> always located at fixed offset to program while mmap()ed memory is >>>>>> randomized. >>>>> >>>>> Want to take more Unix out of Linux? >>>>> >>>>> Honestly, why care about disabling? User space can happily use mmap() if >>>>> it prefers. >>>> >>>> brk() interface doesn't seem to be used much and glibc is happy to switch to >>>> mmap() if brk() fails, so why not allow disabling it optionally? If you >>>> don't care to disable, don't do it and this is even the default. >>> >>> I do not think we want to have config per syscall, do we? >> >> I do wonder if grouping would be a better option then (finding a proper >> level of abstraction ...). > > If hardening and compatibility are seen as tradeoffs, perhaps there > could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this. > It would have options > - "compatibility" (default) to gear questions for maximum compatibility, > deselecting any hardening options which reduce compatibility > - "hardening" to gear questions for maximum hardening, deselecting any > compatibility options which reduce hardening > - "none/manual": ask all questions like before I think the general direction is to avoid an exploding set of config options. So if there isn't a *real* demand, I guess gluing this to a single option ("CONFIG_SECURITY_HARDENING") might be good enough.
From: David Hildenbrand > Sent: 05 October 2020 10:55 ... > > If hardening and compatibility are seen as tradeoffs, perhaps there > > could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this. > > It would have options > > - "compatibility" (default) to gear questions for maximum compatibility, > > deselecting any hardening options which reduce compatibility > > - "hardening" to gear questions for maximum hardening, deselecting any > > compatibility options which reduce hardening > > - "none/manual": ask all questions like before > > I think the general direction is to avoid an exploding set of config > options. So if there isn't a *real* demand, I guess gluing this to a > single option ("CONFIG_SECURITY_HARDENING") might be good enough. Wouldn't that be better achieved by run-time clobbering of the syscall vectors? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 05.10.20 13:21, David Laight wrote: > From: David Hildenbrand >> Sent: 05 October 2020 10:55 > ... >>> If hardening and compatibility are seen as tradeoffs, perhaps there >>> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this. >>> It would have options >>> - "compatibility" (default) to gear questions for maximum compatibility, >>> deselecting any hardening options which reduce compatibility >>> - "hardening" to gear questions for maximum hardening, deselecting any >>> compatibility options which reduce hardening >>> - "none/manual": ask all questions like before >> >> I think the general direction is to avoid an exploding set of config >> options. So if there isn't a *real* demand, I guess gluing this to a >> single option ("CONFIG_SECURITY_HARDENING") might be good enough. > > Wouldn't that be better achieved by run-time clobbering > of the syscall vectors? You mean via something like a boot parameter? Possibly yes.
From: David Hildenbrand > Sent: 05 October 2020 13:19 > > On 05.10.20 13:21, David Laight wrote: > > From: David Hildenbrand > >> Sent: 05 October 2020 10:55 > > ... > >>> If hardening and compatibility are seen as tradeoffs, perhaps there > >>> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this. > >>> It would have options > >>> - "compatibility" (default) to gear questions for maximum compatibility, > >>> deselecting any hardening options which reduce compatibility > >>> - "hardening" to gear questions for maximum hardening, deselecting any > >>> compatibility options which reduce hardening > >>> - "none/manual": ask all questions like before > >> > >> I think the general direction is to avoid an exploding set of config > >> options. So if there isn't a *real* demand, I guess gluing this to a > >> single option ("CONFIG_SECURITY_HARDENING") might be good enough. > > > > Wouldn't that be better achieved by run-time clobbering > > of the syscall vectors? > > You mean via something like a boot parameter? Possibly yes. I was thinking of later. Some kind of restricted system might want the 'clobber' mount() after everything is running. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, 5 Oct 2020 11:11:35 +0300 Topi Miettinen <toiwoton@gmail.com> wrote: > The point is not to shrink the kernel (it will shrink by one small > function) or get rid of complexity. The point is to disable an inferior > interface. Memory returned by mmap() is at a random location but with > brk() it is located near the data segment, so the address is more easily > predictable. So if your true objective is to get glibc to allocate memory differently, perhaps the right thing to do is to patch glibc? Thanks, jon
On 5.10.2020 17.12, Jonathan Corbet wrote: > On Mon, 5 Oct 2020 11:11:35 +0300 > Topi Miettinen <toiwoton@gmail.com> wrote: > >> The point is not to shrink the kernel (it will shrink by one small >> function) or get rid of complexity. The point is to disable an inferior >> interface. Memory returned by mmap() is at a random location but with >> brk() it is located near the data segment, so the address is more easily >> predictable. > > So if your true objective is to get glibc to allocate memory differently, > perhaps the right thing to do is to patch glibc? Of course: https://sourceware.org/pipermail/libc-alpha/2020-October/118319.html But since glibc is pretty much the only user of brk(), it also makes sense to disable it in the kernel if nothing uses it anymore. -Topi
On 5.10.2020 15.25, David Laight wrote: > From: David Hildenbrand >> Sent: 05 October 2020 13:19 >> >> On 05.10.20 13:21, David Laight wrote: >>> From: David Hildenbrand >>>> Sent: 05 October 2020 10:55 >>> ... >>>>> If hardening and compatibility are seen as tradeoffs, perhaps there >>>>> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this. >>>>> It would have options >>>>> - "compatibility" (default) to gear questions for maximum compatibility, >>>>> deselecting any hardening options which reduce compatibility >>>>> - "hardening" to gear questions for maximum hardening, deselecting any >>>>> compatibility options which reduce hardening >>>>> - "none/manual": ask all questions like before >>>> >>>> I think the general direction is to avoid an exploding set of config >>>> options. So if there isn't a *real* demand, I guess gluing this to a >>>> single option ("CONFIG_SECURITY_HARDENING") might be good enough. >>> >>> Wouldn't that be better achieved by run-time clobbering >>> of the syscall vectors? >> >> You mean via something like a boot parameter? Possibly yes. > > I was thinking of later. > Some kind of restricted system might want the 'clobber' > mount() after everything is running. Perhaps suitably privileged tasks should be able to install global seccomp filters which would disregard any NoNewPrivileges requirements and would apply immediately to all tasks. The boot parameter would be also nice so that initrd and PID1 would be also restricted. Seccomp would also allow more specific filtering than messing with the syscall tables. -Topi
On 5.10.2020 15.18, David Hildenbrand wrote: > On 05.10.20 13:21, David Laight wrote: >> From: David Hildenbrand >>> Sent: 05 October 2020 10:55 >> ... >>>> If hardening and compatibility are seen as tradeoffs, perhaps there >>>> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this. >>>> It would have options >>>> - "compatibility" (default) to gear questions for maximum compatibility, >>>> deselecting any hardening options which reduce compatibility >>>> - "hardening" to gear questions for maximum hardening, deselecting any >>>> compatibility options which reduce hardening >>>> - "none/manual": ask all questions like before >>> >>> I think the general direction is to avoid an exploding set of config >>> options. So if there isn't a *real* demand, I guess gluing this to a >>> single option ("CONFIG_SECURITY_HARDENING") might be good enough. >> >> Wouldn't that be better achieved by run-time clobbering >> of the syscall vectors? > > You mean via something like a boot parameter? Possibly yes. > This may be obvious, but a global seccomp filter which doesn't affect NNP can be installed in initrd with a simple program with no changes to kernel: #include <errno.h> #include <seccomp.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(int argc, char **argv) { if (argc < 3) { fprintf(stderr, "Usage: %s syscall [syscall]... program\n", argv[0]); return EXIT_FAILURE; } scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_ALLOW); if (ctx == NULL) { fprintf(stderr, "failed to init filter\n"); return EXIT_FAILURE; } int r; r = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_NNP, 0); if (r != 0) { fprintf(stderr, "failed to disable NNP\n"); return EXIT_FAILURE; } fprintf(stderr, "filtering"); for (int i = 1; i < argc - 1; i++) { const char *syscall = argv[i]; int syscall_nr = seccomp_syscall_resolve_name(syscall); if (syscall_nr == __NR_SCMP_ERROR) { //fprintf(stderr, "unknown syscall %s, ignoring\n", syscall); continue; } r = seccomp_rule_add_exact(ctx, SCMP_ACT_ERRNO(ENOSYS), syscall_nr, 0); if (r != 0) { //fprintf(stderr, "failed to filter syscall %s, ignoring\n", syscall); continue; } fprintf(stderr, " %s", syscall); } fprintf(stderr, "\n"); r = seccomp_load(ctx); if (r != 0) { fprintf(stderr, "failed to apply filter\n"); return EXIT_FAILURE; } seccomp_release(ctx); char *program = argv[argc - 1]; char *new_argv[] = { program, NULL }; execv(program, new_argv); fprintf(stderr, "failed to exec %s\n", program); return EXIT_FAILURE; } This can be inserted in initrd to disable some obsolete and old system calls like this: #!/bin/sh exec /usr/local/sbin/seccomp-exec _sysctl afs_syscall bdflush break create_module ftime get_kernel_syms getpmsg gtty idle lock mpx prof profil putpmsg query_module security sgetmask ssetmask stty sysfs tuxcall ulimit uselib ustat vserver epoll_ctl_old epoll_wait_old old_adjtimex old_getpagesize oldfstat oldlstat oldolduname oldstat oldumount olduname osf_old_creat osf_old_fstat osf_old_getpgrp osf_old_killpg osf_old_lstat osf_old_open osf_old_sigaction osf_old_sigblock osf_old_sigreturn osf_old_sigsetmask osf_old_sigvec osf_old_stat osf_old_vadvise osf_old_vtrace osf_old_wait osf_oldquota vm86old brk /init -Topi
diff --git a/init/Kconfig b/init/Kconfig index c5ea2e694f6a..53735ac305d8 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1851,6 +1851,20 @@ config SLUB_MEMCG_SYSFS_ON controlled by slub_memcg_sysfs boot parameter and this config option determines the parameter's default value. +config BRK_SYSCALL + bool "Enable brk() system call" if EXPERT + default y + help + Enable the brk() system call that allows to change data + segment size (heap). This is mainly used by glibc for memory + allocation, but it can use mmap() and that results in more + randomized memory mappings since the heap is always located + at fixed offset to program while mmap()ed memory is + randomized. + + If unsure, say Y for maximum compatibility. + +if BRK_SYSCALL config COMPAT_BRK bool "Disable heap randomization" default y @@ -1862,6 +1876,7 @@ config COMPAT_BRK /proc/sys/kernel/randomize_va_space to 2 or 3. On non-ancient distros (post-2000 ones) N is usually a safe choice. +endif # BRK_SYSCALL choice prompt "Choose SLAB allocator" diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 4d59775ea79c..3ffa5c4002e1 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -299,6 +299,8 @@ COND_SYSCALL(recvmmsg_time32); COND_SYSCALL_COMPAT(recvmmsg_time32); COND_SYSCALL_COMPAT(recvmmsg_time64); +COND_SYSCALL(brk); + /* * Architecture specific syscalls: see further below */ diff --git a/mm/mmap.c b/mm/mmap.c index 489368f43af1..653be2c8982a 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -188,6 +188,7 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma) static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags, struct list_head *uf); +#ifdef CONFIG_BRK_SYSCALL SYSCALL_DEFINE1(brk, unsigned long, brk) { unsigned long retval; @@ -286,6 +287,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) mmap_write_unlock(mm); return retval; } +#endif static inline unsigned long vma_compute_gap(struct vm_area_struct *vma) {
The brk() system call allows to change data segment size (heap). This is mainly used by glibc for memory allocation, but it can use mmap() and that results in more randomized memory mappings since the heap is always located at fixed offset to program while mmap()ed memory is randomized. Signed-off-by: Topi Miettinen <toiwoton@gmail.com> --- init/Kconfig | 15 +++++++++++++++ kernel/sys_ni.c | 2 ++ mm/mmap.c | 2 ++ 3 files changed, 19 insertions(+)