Message ID | 20200717170008.5949-1-kristen@linux.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Function Granular KASLR | expand |
Let me CC live-patching ML, because from a quick glance this is something which could impact live patching code. At least it invalidates assumptions which "sympos" is based on. Miroslav On Fri, 17 Jul 2020, Kristen Carlson Accardi wrote: > Function Granular Kernel Address Space Layout Randomization (fgkaslr) > --------------------------------------------------------------------- > > This patch set is an implementation of finer grained kernel address space > randomization. It rearranges your kernel code at load time > on a per-function level granularity, with only around a second added to > boot time. [...] > Background > ---------- > KASLR was merged into the kernel with the objective of increasing the > difficulty of code reuse attacks. Code reuse attacks reused existing code > snippets to get around existing memory protections. They exploit software bugs > which expose addresses of useful code snippets to control the flow of > execution for their own nefarious purposes. KASLR moves the entire kernel > code text as a unit at boot time in order to make addresses less predictable. > The order of the code within the segment is unchanged - only the base address > is shifted. There are a few shortcomings to this algorithm. > > 1. Low Entropy - there are only so many locations the kernel can fit in. This > means an attacker could guess without too much trouble. > 2. Knowledge of a single address can reveal the offset of the base address, > exposing all other locations for a published/known kernel image. > 3. Info leaks abound. > > Finer grained ASLR has been proposed as a way to make ASLR more resistant > to info leaks. It is not a new concept at all, and there are many variations > possible. Function reordering is an implementation of finer grained ASLR > which randomizes the layout of an address space on a function level > granularity. We use the term "fgkaslr" in this document to refer to the > technique of function reordering when used with KASLR, as well as finer grained > KASLR in general. > > Proposed Improvement > -------------------- > This patch set proposes adding function reordering on top of the existing > KASLR base address randomization. The over-arching objective is incremental > improvement over what we already have. It is designed to work in combination > with the existing solution. The implementation is really pretty simple, and > there are 2 main area where changes occur: > > * Build time > > GCC has had an option to place functions into individual .text sections for > many years now. This option can be used to implement function reordering at > load time. The final compiled vmlinux retains all the section headers, which > can be used to help find the address ranges of each function. Using this > information and an expanded table of relocation addresses, individual text > sections can be suffled immediately after decompression. Some data tables > inside the kernel that have assumptions about order require re-sorting > after being updated when applying relocations. In order to modify these tables, > a few key symbols are excluded from the objcopy symbol stripping process for > use after shuffling the text segments. > > Some highlights from the build time changes to look for: > > The top level kernel Makefile was modified to add the gcc flag if it > is supported. Currently, I am applying this flag to everything it is > possible to randomize. Anything that is written in C and not present in a > special input section is randomized. The final binary segment 0 retains a > consolidated .text section, as well as all the individual .text.* sections. > Future work could turn off this flags for selected files or even entire > subsystems, although obviously at the cost of security. > > The relocs tool is updated to add relative relocations. This information > previously wasn't included because it wasn't necessary when moving the > entire .text segment as a unit. > > A new file was created to contain a list of symbols that objcopy should > keep. We use those symbols at load time as described below. > > * Load time > > The boot kernel was modified to parse the vmlinux elf file after > decompression to check for our interesting symbols that we kept, and to > look for any .text.* sections to randomize. The consolidated .text section > is skipped and not moved. The sections are shuffled randomly, and copied > into memory following the .text section in a new random order. The existing > code which updated relocation addresses was modified to account for > not just a fixed delta from the load address, but the offset that the function > section was moved to. This requires inspection of each address to see if > it was impacted by a randomization. We use a bsearch to make this less > horrible on performance. Any tables that need to be modified with new > addresses or resorted are updated using the symbol addresses parsed from the > elf symbol table. > > In order to hide our new layout, symbols reported through /proc/kallsyms > will be displayed in a random order. > > Security Considerations > ----------------------- > The objective of this patch set is to improve a technology that is already > merged into the kernel (KASLR). This code will not prevent all attacks, > but should instead be considered as one of several tools that can be used. > In particular, this code is meant to make KASLR more effective in the presence > of info leaks. > > How much entropy we are adding to the existing entropy of standard KASLR will > depend on a few variables. Firstly and most obviously, the number of functions > that are randomized matters. This implementation keeps the existing .text > section for code that cannot be randomized - for example, because it was > assembly code. The less sections to randomize, the less entropy. In addition, > due to alignment (16 bytes for x86_64), the number of bits in a address that > the attacker needs to guess is reduced, as the lower bits are identical. [...] > Modules > ------- > Modules are randomized similarly to the rest of the kernel by shuffling > the sections at load time prior to moving them into memory. The module must > also have been build with the -ffunction-sections compiler option. > > Although fgkaslr for the kernel is only supported for the X86_64 architecture, > it is possible to use fgkaslr with modules on other architectures. To enable > this feature, select > > CONFIG_MODULE_FG_KASLR=y > > This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set. > > Disabling > --------- > Disabling normal KASLR using the nokaslr command line option also disables > fgkaslr. It is also possible to disable fgkaslr separately by booting with > fgkaslr=off on the commandline.
On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote: > Let me CC live-patching ML, because from a quick glance this is something > which could impact live patching code. At least it invalidates assumptions > which "sympos" is based on. In a quick skim, it looks like the symbol resolution is using kallsyms_on_each_symbol(), so I think this is safe? What's a good selftest for live-patching?
On 7/22/20 10:39 AM, Kees Cook wrote: > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote: >> Let me CC live-patching ML, because from a quick glance this is something >> which could impact live patching code. At least it invalidates assumptions >> which "sympos" is based on. > > In a quick skim, it looks like the symbol resolution is using > kallsyms_on_each_symbol(), so I think this is safe? What's a good > selftest for live-patching? > Hi Kees, I don't think any of the in-tree tests currently exercise the kallsyms/sympos end of livepatching. I do have a local branch that does facilitate creating klp-relocations that do rely upon this feature -- I'll try to see if I can get those working with this patchset and report back later this week. -- Joe
On 7/22/20 10:51 AM, Joe Lawrence wrote: > On 7/22/20 10:39 AM, Kees Cook wrote: >> On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote: >>> Let me CC live-patching ML, because from a quick glance this is something >>> which could impact live patching code. At least it invalidates assumptions >>> which "sympos" is based on. >> >> In a quick skim, it looks like the symbol resolution is using >> kallsyms_on_each_symbol(), so I think this is safe? What's a good >> selftest for live-patching? >> > > Hi Kees, > > I don't think any of the in-tree tests currently exercise the > kallsyms/sympos end of livepatching. > On second thought, I mispoke.. The general livepatch code does use it: klp_init_object klp_init_object_loaded klp_find_object_symbol in which case any of the current kselftests should exercise that. % make -C tools/testing/selftests/livepatch run_tests -- Joe
On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote: > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote: > > Let me CC live-patching ML, because from a quick glance this is something > > which could impact live patching code. At least it invalidates assumptions > > which "sympos" is based on. > > In a quick skim, it looks like the symbol resolution is using > kallsyms_on_each_symbol(), so I think this is safe? What's a good > selftest for live-patching? The problem is duplicate symbols. If there are two static functions named 'foo' then livepatch needs a way to distinguish them. Our current approach to that problem is "sympos". We rely on the fact that the second foo() always comes after the first one in the symbol list and kallsyms. So they're referred to as foo,1 and foo,2.
On Wed, 2020-07-22 at 10:56 -0400, Joe Lawrence wrote: > On 7/22/20 10:51 AM, Joe Lawrence wrote: > > On 7/22/20 10:39 AM, Kees Cook wrote: > > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote: > > > > Let me CC live-patching ML, because from a quick glance this is > > > > something > > > > which could impact live patching code. At least it invalidates > > > > assumptions > > > > which "sympos" is based on. > > > > > > In a quick skim, it looks like the symbol resolution is using > > > kallsyms_on_each_symbol(), so I think this is safe? What's a good > > > selftest for live-patching? > > > > > > > Hi Kees, > > > > I don't think any of the in-tree tests currently exercise the > > kallsyms/sympos end of livepatching. > > > > On second thought, I mispoke.. The general livepatch code does use > it: > > klp_init_object > klp_init_object_loaded > klp_find_object_symbol > > in which case any of the current kselftests should exercise that. > > % make -C tools/testing/selftests/livepatch run_tests > > -- Joe > Thanks, it looks like this should work for helping me exercise the live patch code paths. I will take a look and get back to you all.
On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote: > On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote: > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote: > > > Let me CC live-patching ML, because from a quick glance this is something > > > which could impact live patching code. At least it invalidates assumptions > > > which "sympos" is based on. > > > > In a quick skim, it looks like the symbol resolution is using > > kallsyms_on_each_symbol(), so I think this is safe? What's a good > > selftest for live-patching? > > The problem is duplicate symbols. If there are two static functions > named 'foo' then livepatch needs a way to distinguish them. > > Our current approach to that problem is "sympos". We rely on the fact > that the second foo() always comes after the first one in the symbol > list and kallsyms. So they're referred to as foo,1 and foo,2. Ah. Fun. In that case, perhaps the LTO series has some solutions. I think builds with LTO end up renaming duplicate symbols like that, so it'll be back to being unique.
On Wed, 2020-07-22 at 12:42 -0700, Kees Cook wrote: > On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote: > > On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote: > > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote: > > > > Let me CC live-patching ML, because from a quick glance this is > > > > something > > > > which could impact live patching code. At least it invalidates > > > > assumptions > > > > which "sympos" is based on. > > > > > > In a quick skim, it looks like the symbol resolution is using > > > kallsyms_on_each_symbol(), so I think this is safe? What's a good > > > selftest for live-patching? > > > > The problem is duplicate symbols. If there are two static > > functions > > named 'foo' then livepatch needs a way to distinguish them. > > > > Our current approach to that problem is "sympos". We rely on the > > fact > > that the second foo() always comes after the first one in the > > symbol > > list and kallsyms. So they're referred to as foo,1 and foo,2. > > Ah. Fun. In that case, perhaps the LTO series has some solutions. I > think builds with LTO end up renaming duplicate symbols like that, so > it'll be back to being unique. > Well, glad to hear there might be some precendence for how to solve this, as I wasn't able to think of something reasonable off the top of my head. Are you speaking of the Clang LTO series? https://lore.kernel.org/lkml/20200624203200.78870-1-samitolvanen@google.com/
On Wed, Jul 22, 2020 at 12:56:10PM -0700, Kristen Carlson Accardi wrote: > On Wed, 2020-07-22 at 12:42 -0700, Kees Cook wrote: > > On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote: > > > On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote: > > > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote: > > > > > Let me CC live-patching ML, because from a quick glance this is > > > > > something > > > > > which could impact live patching code. At least it invalidates > > > > > assumptions > > > > > which "sympos" is based on. > > > > > > > > In a quick skim, it looks like the symbol resolution is using > > > > kallsyms_on_each_symbol(), so I think this is safe? What's a good > > > > selftest for live-patching? > > > > > > The problem is duplicate symbols. If there are two static > > > functions > > > named 'foo' then livepatch needs a way to distinguish them. > > > > > > Our current approach to that problem is "sympos". We rely on the > > > fact > > > that the second foo() always comes after the first one in the > > > symbol > > > list and kallsyms. So they're referred to as foo,1 and foo,2. > > > > Ah. Fun. In that case, perhaps the LTO series has some solutions. I > > think builds with LTO end up renaming duplicate symbols like that, so > > it'll be back to being unique. > > > > Well, glad to hear there might be some precendence for how to solve > this, as I wasn't able to think of something reasonable off the top of > my head. Are you speaking of the Clang LTO series? > https://lore.kernel.org/lkml/20200624203200.78870-1-samitolvanen@google.com/ I'm not sure how LTO does it, but a few more (half-brained) ideas that could work: 1) Add a field in kallsyms to keep track of a symbol's original offset before randomization/re-sorting. Livepatch could use that field to determine the original sympos. 2) In fgkaslr code, go through all the sections and mark the ones which have duplicates (i.e. same name). Then when shuffling the sections, skip a shuffle if it involves a duplicate section. That way all the duplicates would retain their original sympos. 3) Livepatch could uniquely identify symbols by some feature other than sympos. For example: Symbol/function size - obviously this would only work if duplicately named symbols have different sizes. Checksum - as part of a separate feature we're also looking at giving each function its own checksum, calculated based on its instruction opcodes. Though calculating checksums at runtime could be complicated by IP-relative addressing. I'm thinking #1 or #2 wouldn't be too bad. #3 might be harder.
Hi, > On Fri, 17 Jul 2020, Kristen Carlson Accardi wrote: > >> Function Granular Kernel Address Space Layout Randomization (fgkaslr) >> --------------------------------------------------------------------- >> >> This patch set is an implementation of finer grained kernel address space >> randomization. It rearranges your kernel code at load time >> on a per-function level granularity, with only around a second added to >> boot time. > > [...] >> Modules >> ------- >> Modules are randomized similarly to the rest of the kernel by shuffling >> the sections at load time prior to moving them into memory. The module must >> also have been build with the -ffunction-sections compiler option. It seems, a couple more adjustments are needed in the module loader code. With function granular KASLR, modules will have lots of ELF sections due to -ffunction-sections. On my x86_64 system with kernel 5.8-rc7 with FG KASLR patches, for example, xfs.ko has 4849 ELF sections total, 2428 of these are loaded and shown in /sys/module/xfs/sections/. There are at least 2 places where high-order memory allocations might happen during module loading. Such allocations may fail if memory is fragmented, while physically contiguous memory areas are not really needed there. I suggest to switch to kvmalloc/kvfree there. 1. kernel/module.c, randomize_text(): Elf_Shdr **text_list; ... int max_sections = info->hdr->e_shnum; ... text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL); The size of the allocated memory area is (8 * total_number_of_sections), if I understand it right, which is 38792 for xfs.ko, a 4th order allocation. 2. kernel/module.c, mod_sysfs_setup() => add_sect_attrs(). This allocation can be larger than the first one. We found this issue with livepatch modules some time ago (these modules are already built with -ffunction-sections) [1], but, with FG KASLR, it affects all kernel modules. Large ones like xfs.ko, btrfs.ko, etc., could suffer the most from it. When a module is loaded sysfs attributes are created for its ELF sections (visible as /sys/module/<module_name>/sections/*). and contain the start addresses of these ELF sections. A single memory chunk is allocated for all these: size[0] = ALIGN(sizeof(*sect_attrs) + nloaded * sizeof(sect_attrs->attrs[0]), sizeof(sect_attrs->grp.attrs[0])); size[1] = (nloaded + 1) * sizeof(sect_attrs->grp.attrs[0]); sect_attrs = kzalloc(size[0] + size[1], GFP_KERNEL); 'nloaded' is the number of loaded ELF section in the module. For the kernel 5.8-rc7 on my system, the total size is 56 + 72 * nloaded, which is 174872 for xfs.ko, 43 pages, 6th order allocation. I enabled 'mm_page_alloc' tracepoint with filter 'order > 3' to confirm the issue and, indeed, got these two allocations when modprobe'ing xfs: ---------------------------- /sys/kernel/debug/tracing/trace: modprobe-1509 <...>: mm_page_alloc: <...> order=4 migratetype=0 gfp_flags=GFP_KERNEL|__GFP_COMP modprobe-1509 <stack trace> => __alloc_pages_nodemask => alloc_pages_current => kmalloc_order => kmalloc_order_trace => __kmalloc => load_module modprobe-1509 <...>: mm_page_alloc: <...> order=6 migratetype=0 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO modprobe-1509 <stack trace> => __alloc_pages_nodemask => alloc_pages_current => kmalloc_order => kmalloc_order_trace => __kmalloc => mod_sysfs_setup => load_module ---------------------------- I suppose, something like this can be used as workaround: * for randomize_text(): ----------- diff --git a/kernel/module.c b/kernel/module.c index 0f4f4e567a42..a2473db1d0a3 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -2433,7 +2433,7 @@ static void randomize_text(struct module *mod, struct load_info *info) if (sec == 0) return; - text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL); + text_list = kvmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL); if (!text_list) return; @@ -2466,7 +2466,7 @@ static void randomize_text(struct module *mod, struct load_info *info) shdr->sh_entsize = get_offset(mod, &size, shdr, 0); } - kfree(text_list); + kvfree(text_list); } /* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld ----------- * for add_sect_attrs(): ----------- diff --git a/kernel/module.c b/kernel/module.c index 0f4f4e567a42..a2473db1d0a3 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1541,7 +1541,7 @@ static void free_sect_attrs(struct module_sect_attrs *sect_attrs) for (section = 0; section < sect_attrs->nsections; section++) kfree(sect_attrs->attrs[section].battr.attr.name); - kfree(sect_attrs); + kvfree(sect_attrs); } static void add_sect_attrs(struct module *mod, const struct load_info *info) @@ -1558,7 +1558,7 @@ static void add_sect_attrs(struct module *mod, const struct load_info *info) size[0] = ALIGN(struct_size(sect_attrs, attrs, nloaded), sizeof(sect_attrs->grp.bin_attrs[0])); size[1] = (nloaded + 1) * sizeof(sect_attrs->grp.bin_attrs[0]); - sect_attrs = kzalloc(size[0] + size[1], GFP_KERNEL); + sect_attrs = kvzalloc(size[0] + size[1], GFP_KERNEL); if (sect_attrs == NULL) return; ----------- [1] https://github.com/dynup/kpatch/pull/1131 Regards, Evgenii
On Mon, Aug 03, 2020 at 02:39:32PM +0300, Evgenii Shatokhin wrote: > There are at least 2 places where high-order memory allocations might happen > during module loading. Such allocations may fail if memory is fragmented, > while physically contiguous memory areas are not really needed there. I > suggest to switch to kvmalloc/kvfree there. While this does seem to be the right solution for the extant problem, I do want to take a moment and ask if the function sections need to be exposed at all? What tools use this information, and do they just want to see the bounds of the code region? (i.e. the start/end of all the .text* sections) Perhaps .text.* could be excluded from the sysfs section list?
On 8/3/20 1:45 PM, Kees Cook wrote: > On Mon, Aug 03, 2020 at 02:39:32PM +0300, Evgenii Shatokhin wrote: >> There are at least 2 places where high-order memory allocations might happen >> during module loading. Such allocations may fail if memory is fragmented, >> while physically contiguous memory areas are not really needed there. I >> suggest to switch to kvmalloc/kvfree there. > > While this does seem to be the right solution for the extant problem, I > do want to take a moment and ask if the function sections need to be > exposed at all? What tools use this information, and do they just want > to see the bounds of the code region? (i.e. the start/end of all the > .text* sections) Perhaps .text.* could be excluded from the sysfs > section list? > [[cc += FChE, see [0] for Evgenii's full mail ]] It looks like debugging tools like systemtap [1], gdb [2] and its add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info. But yeah, it would be preferable if we didn't export a long sysfs representation if nobody actually needs it. [0] https://lore.kernel.org/lkml/e9c4d88b-86db-47e9-4299-3fac45a7e3fd@virtuozzo.com/ [1] https://fossies.org/linux/systemtap/staprun/staprun.c [2] https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch04.html#linuxdrive3-CHP-4-SECT-6.1 -- Joe
Hi - > > While this does seem to be the right solution for the extant problem, I > > do want to take a moment and ask if the function sections need to be > > exposed at all? What tools use this information, and do they just want > > to see the bounds of the code region? (i.e. the start/end of all the > > .text* sections) Perhaps .text.* could be excluded from the sysfs > > section list? > [[cc += FChE, see [0] for Evgenii's full mail ]] Thanks! > It looks like debugging tools like systemtap [1], gdb [2] and its > add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info. > But yeah, it would be preferable if we didn't export a long sysfs > representation if nobody actually needs it. Systemtap needs to know base addresses of loaded text & data sections, in order to perform relocation of probe point PCs and context data addresses. It uses /sys/module/...., kind of under protest, because there seems to exist no MODULE_EXPORT'd API to get at that information some other way. - FChE
On Mon, Aug 03, 2020 at 03:38:37PM -0400, Frank Ch. Eigler wrote: > Hi - > > > > While this does seem to be the right solution for the extant problem, I > > > do want to take a moment and ask if the function sections need to be > > > exposed at all? What tools use this information, and do they just want > > > to see the bounds of the code region? (i.e. the start/end of all the > > > .text* sections) Perhaps .text.* could be excluded from the sysfs > > > section list? > > > [[cc += FChE, see [0] for Evgenii's full mail ]] > > Thanks! > > > It looks like debugging tools like systemtap [1], gdb [2] and its > > add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info. > > But yeah, it would be preferable if we didn't export a long sysfs > > representation if nobody actually needs it. > > Systemtap needs to know base addresses of loaded text & data sections, > in order to perform relocation of probe point PCs and context data > addresses. It uses /sys/module/...., kind of under protest, because > there seems to exist no MODULE_EXPORT'd API to get at that information > some other way. Wouldn't /proc/kallsysms entries cover this? I must be missing something...
Hi - On Mon, Aug 03, 2020 at 01:11:27PM -0700, Kees Cook wrote: > [...] > > Systemtap needs to know base addresses of loaded text & data sections, > > in order to perform relocation of probe point PCs and context data > > addresses. It uses /sys/module/...., kind of under protest, because > > there seems to exist no MODULE_EXPORT'd API to get at that information > > some other way. > > Wouldn't /proc/kallsysms entries cover this? I must be missing > something... We have relocated based on sections, not some subset of function symbols accessible that way, partly because DWARF line- and DIE- based probes can map to addresses some way away from function symbols, into function interiors, or cloned/moved bits of optimized code. It would take some work to prove that function-symbol based heuristic arithmetic would have just as much reach. - FChE
On Mon, Aug 03, 2020 at 05:12:28PM -0400, Frank Ch. Eigler wrote: > Hi - > > On Mon, Aug 03, 2020 at 01:11:27PM -0700, Kees Cook wrote: > > [...] > > > Systemtap needs to know base addresses of loaded text & data sections, > > > in order to perform relocation of probe point PCs and context data > > > addresses. It uses /sys/module/...., kind of under protest, because > > > there seems to exist no MODULE_EXPORT'd API to get at that information > > > some other way. > > > > Wouldn't /proc/kallsysms entries cover this? I must be missing > > something... > > We have relocated based on sections, not some subset of function > symbols accessible that way, partly because DWARF line- and DIE- based > probes can map to addresses some way away from function symbols, into > function interiors, or cloned/moved bits of optimized code. It would > take some work to prove that function-symbol based heuristic > arithmetic would have just as much reach. Interesting. Do you have an example handy? It seems like something like that would reference the enclosing section, which means we can't just leave them out of the sysfs list... (but if such things never happen in the function-sections, then we *can* remove them...)
Hi - > > We have relocated based on sections, not some subset of function > > symbols accessible that way, partly because DWARF line- and DIE- based > > probes can map to addresses some way away from function symbols, into > > function interiors, or cloned/moved bits of optimized code. It would > > take some work to prove that function-symbol based heuristic > > arithmetic would have just as much reach. > > Interesting. Do you have an example handy? No, I'm afraid I don't have one that I know cannot possibly be expressed by reference to a function symbol only. I'd look at systemtap (4.3) probe point lists like: % stap -vL 'kernel.statement("*@kernel/*verif*.c:*")' % stap -vL 'module("amdgpu").statement("*@*execution*.c:*")' which give an impression of computed PC addresses. > It seems like something like that would reference the enclosing > section, which means we can't just leave them out of the sysfs > list... (but if such things never happen in the function-sections, > then we *can* remove them...) I'm not sure we can easily prove they can never happen there. - FChE
+++ Joe Lawrence [03/08/20 14:17 -0400]: >On 8/3/20 1:45 PM, Kees Cook wrote: >>On Mon, Aug 03, 2020 at 02:39:32PM +0300, Evgenii Shatokhin wrote: >>>There are at least 2 places where high-order memory allocations might happen >>>during module loading. Such allocations may fail if memory is fragmented, >>>while physically contiguous memory areas are not really needed there. I >>>suggest to switch to kvmalloc/kvfree there. Thanks Evgenii for pointing out the potential memory allocation issues that may arise with very large modules when memory is fragmented. I was curious as to which modules on my machine would be considered large, and there seems to be quite a handful...(x86_64 with v5.8-rc6 with a relatively standard distro config and FG KASLR patches on top): ./amdgpu/sections 7277 ./i915/sections 4267 ./nouveau/sections 3772 ./xfs/sections 2395 ./btrfs/sections 1966 ./mac80211/sections 1588 ./kvm/sections 1468 ./cfg80211/sections 1194 ./drm/sections 1012 ./bluetooth/sections 843 ./iwlmvm/sections 664 ./usbcore/sections 524 ./videodev/sections 436 So, I agree with the suggestion that we could switch to kvmalloc() to try to mitigate potential allocation problems when memory is fragmented. >>While this does seem to be the right solution for the extant problem, I >>do want to take a moment and ask if the function sections need to be >>exposed at all? What tools use this information, and do they just want >>to see the bounds of the code region? (i.e. the start/end of all the >>.text* sections) Perhaps .text.* could be excluded from the sysfs >>section list? > >[[cc += FChE, see [0] for Evgenii's full mail ]] > >It looks like debugging tools like systemtap [1], gdb [2] and its >add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info. > >But yeah, it would be preferable if we didn't export a long sysfs >representation if nobody actually needs it. Thanks Joe for looking into this. Hmm, AFAICT for gdb it's not a hard dependency per se - for add-symbol-file I was under the impression that we are responsible for obtaining the relevant section addresses ourselves through /sys/module/ (the most oft cited method) and then feeding those to add-symbol-file. It would definitely be more difficult to find out the section addresses without the /sys/module/ section entries. In any case, it sounds like systemtap has a hard dependency on /sys/module/*/sections anyway. Regarding /proc/kallsyms, I think it is probably possible to expose section symbols and their addresses via /proc/kallsyms rather than through sysfs (it would then live in the module's vmalloc'ed memory) but I'm not sure how helpful that would actually be, especially since existing tools depend on the sysfs interface being there. >[0] https://lore.kernel.org/lkml/e9c4d88b-86db-47e9-4299-3fac45a7e3fd@virtuozzo.com/ >[1] https://fossies.org/linux/systemtap/staprun/staprun.c >[2] https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch04.html#linuxdrive3-CHP-4-SECT-6.1 > >-- Joe
On Fri, Jul 17, 2020 at 09:59:57AM -0700, Kristen Carlson Accardi wrote: > Function Granular Kernel Address Space Layout Randomization (fgkaslr) > --------------------------------------------------------------------- > > This patch set is an implementation of finer grained kernel address space > randomization. It rearranges your kernel code at load time > on a per-function level granularity, with only around a second added to > boot time. > > Changes in v4: > ------------- > * dropped the patch to split out change to STATIC definition in > x86/boot/compressed/misc.c and replaced with a patch authored > by Kees Cook to avoid the duplicate malloc definitions > * Added a section to Documentation/admin-guide/kernel-parameters.txt > to document the fgkaslr boot option. > * redesigned the patch to hide the new layout when reading > /proc/kallsyms. The previous implementation utilized a dynamically > allocated linked list to display the kernel and module symbols > in alphabetical order. The new implementation uses a randomly > shuffled index array to display the kernel and module symbols > in a random order. > > Changes in v3: > ------------- > * Makefile changes to accommodate CONFIG_LD_DEAD_CODE_DATA_ELIMINATION > * removal of extraneous ALIGN_PAGE from _etext changes > * changed variable names in x86/tools/relocs to be less confusing > * split out change to STATIC definition in x86/boot/compressed/misc.c > * Updates to Documentation to make it more clear what is preserved in .text > * much more detailed commit message for function granular KASLR patch > * minor tweaks and changes that make for more readable code > * this cover letter updated slightly to add additional details > > Changes in v2: > -------------- > * Fix to address i386 build failure > * Allow module reordering patch to be configured separately so that > arm (or other non-x86_64 arches) can take advantage of module function > reordering. This support has not be tested by me, but smoke tested by > Ard Biesheuvel <ardb@kernel.org> on arm. > * Fix build issue when building on arm as reported by > Ard Biesheuvel <ardb@kernel.org> > > Patches to objtool are included because they are dependencies for this > patchset, however they have been submitted by their maintainer separately. > > Background > ---------- > KASLR was merged into the kernel with the objective of increasing the > difficulty of code reuse attacks. Code reuse attacks reused existing code > snippets to get around existing memory protections. They exploit software bugs > which expose addresses of useful code snippets to control the flow of > execution for their own nefarious purposes. KASLR moves the entire kernel > code text as a unit at boot time in order to make addresses less predictable. > The order of the code within the segment is unchanged - only the base address > is shifted. There are a few shortcomings to this algorithm. > > 1. Low Entropy - there are only so many locations the kernel can fit in. This > means an attacker could guess without too much trouble. > 2. Knowledge of a single address can reveal the offset of the base address, > exposing all other locations for a published/known kernel image. > 3. Info leaks abound. > > Finer grained ASLR has been proposed as a way to make ASLR more resistant > to info leaks. It is not a new concept at all, and there are many variations > possible. Function reordering is an implementation of finer grained ASLR > which randomizes the layout of an address space on a function level > granularity. We use the term "fgkaslr" in this document to refer to the > technique of function reordering when used with KASLR, as well as finer grained > KASLR in general. > > Proposed Improvement > -------------------- > This patch set proposes adding function reordering on top of the existing > KASLR base address randomization. The over-arching objective is incremental > improvement over what we already have. It is designed to work in combination > with the existing solution. The implementation is really pretty simple, and > there are 2 main area where changes occur: > > * Build time > > GCC has had an option to place functions into individual .text sections for > many years now. This option can be used to implement function reordering at > load time. The final compiled vmlinux retains all the section headers, which > can be used to help find the address ranges of each function. Using this > information and an expanded table of relocation addresses, individual text > sections can be suffled immediately after decompression. Some data tables > inside the kernel that have assumptions about order require re-sorting > after being updated when applying relocations. In order to modify these tables, > a few key symbols are excluded from the objcopy symbol stripping process for > use after shuffling the text segments. > > Some highlights from the build time changes to look for: > > The top level kernel Makefile was modified to add the gcc flag if it > is supported. Currently, I am applying this flag to everything it is > possible to randomize. Anything that is written in C and not present in a > special input section is randomized. The final binary segment 0 retains a > consolidated .text section, as well as all the individual .text.* sections. > Future work could turn off this flags for selected files or even entire > subsystems, although obviously at the cost of security. > > The relocs tool is updated to add relative relocations. This information > previously wasn't included because it wasn't necessary when moving the > entire .text segment as a unit. > > A new file was created to contain a list of symbols that objcopy should > keep. We use those symbols at load time as described below. > > * Load time > > The boot kernel was modified to parse the vmlinux elf file after > decompression to check for our interesting symbols that we kept, and to > look for any .text.* sections to randomize. The consolidated .text section > is skipped and not moved. The sections are shuffled randomly, and copied > into memory following the .text section in a new random order. The existing > code which updated relocation addresses was modified to account for > not just a fixed delta from the load address, but the offset that the function > section was moved to. This requires inspection of each address to see if > it was impacted by a randomization. We use a bsearch to make this less > horrible on performance. Any tables that need to be modified with new > addresses or resorted are updated using the symbol addresses parsed from the > elf symbol table. > > In order to hide our new layout, symbols reported through /proc/kallsyms > will be displayed in a random order. > > Security Considerations > ----------------------- > The objective of this patch set is to improve a technology that is already > merged into the kernel (KASLR). This code will not prevent all attacks, > but should instead be considered as one of several tools that can be used. > In particular, this code is meant to make KASLR more effective in the presence > of info leaks. > > How much entropy we are adding to the existing entropy of standard KASLR will > depend on a few variables. Firstly and most obviously, the number of functions > that are randomized matters. This implementation keeps the existing .text > section for code that cannot be randomized - for example, because it was > assembly code. The less sections to randomize, the less entropy. In addition, > due to alignment (16 bytes for x86_64), the number of bits in a address that > the attacker needs to guess is reduced, as the lower bits are identical. > > Performance Impact > ------------------ > There are two areas where function reordering can impact performance: boot > time latency, and run time performance. > > * Boot time latency > This implementation of finer grained KASLR impacts the boot time of the kernel > in several places. It requires additional parsing of the kernel ELF file to > obtain the section headers of the sections to be randomized. It calls the > random number generator for each section to be randomized to determine that > section's new memory location. It copies the decompressed kernel into a new > area of memory to avoid corruption when laying out the newly randomized > sections. It increases the number of relocations the kernel has to perform at > boot time vs. standard KASLR, and it also requires a lookup on each address > that needs to be relocated to see if it was in a randomized section and needs > to be adjusted by a new offset. Finally, it re-sorts a few data tables that > are required to be sorted by address. > > Booting a test VM on a modern, well appointed system showed an increase in > latency of approximately 1 second. > > * Run time > The performance impact at run-time of function reordering varies by workload. > Using kcbench, a kernel compilation benchmark, the performance of a kernel > build with finer grained KASLR was about 1% slower than a kernel with standard > KASLR. Analysis with perf showed a slightly higher percentage of > L1-icache-load-misses. Other workloads were examined as well, with varied > results. Some workloads performed significantly worse under FGKASLR, while > others stayed the same or were mysteriously better. In general, it will > depend on the code flow whether or not finer grained KASLR will impact > your workload, and how the underlying code was designed. Because the layout > changes per boot, each time a system is rebooted the performance of a workload > may change. > > Future work could identify hot areas that may not be randomized and either > leave them in the .text section or group them together into a single section > that may be randomized. If grouping things together helps, one other thing to > consider is that if we could identify text blobs that should be grouped together > to benefit a particular code flow, it could be interesting to explore > whether this security feature could be also be used as a performance > feature if you are interested in optimizing your kernel layout for a > particular workload at boot time. Optimizing function layout for a particular > workload has been researched and proven effective - for more information > read the Facebook paper "Optimizing Function Placement for Large-Scale > Data-Center Applications" (see references section below). > > Image Size > ---------- > Adding additional section headers as a result of compiling with > -ffunction-sections will increase the size of the vmlinux ELF file. > With a standard distro config, the resulting vmlinux was increased by > about 3%. The compressed image is also increased due to the header files, > as well as the extra relocations that must be added. You can expect fgkaslr > to increase the size of the compressed image by about 15%. > > Memory Usage > ------------ > fgkaslr increases the amount of heap that is required at boot time, > although this extra memory is released when the kernel has finished > decompression. As a result, it may not be appropriate to use this feature on > systems without much memory. > > Building > -------- > To enable fine grained KASLR, you need to have the following config options > set (including all the ones you would use to build normal KASLR) > > CONFIG_FG_KASLR=y > > In addition, fgkaslr is only supported for the X86_64 architecture. > > Modules > ------- > Modules are randomized similarly to the rest of the kernel by shuffling > the sections at load time prior to moving them into memory. The module must > also have been build with the -ffunction-sections compiler option. > > Although fgkaslr for the kernel is only supported for the X86_64 architecture, > it is possible to use fgkaslr with modules on other architectures. To enable > this feature, select > > CONFIG_MODULE_FG_KASLR=y > > This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set. > > Disabling > --------- > Disabling normal KASLR using the nokaslr command line option also disables > fgkaslr. It is also possible to disable fgkaslr separately by booting with > fgkaslr=off on the commandline. > > References > ---------- > There are a lot of academic papers which explore finer grained ASLR. > This paper in particular contributed the most to my implementation design > as well as my overall understanding of the problem space: > > Selfrando: Securing the Tor Browser against De-anonymization Exploits, > M. Conti, S. Crane, T. Frassetto, et al. > > For more information on how function layout impacts performance, see: > > Optimizing Function Placement for Large-Scale Data-Center Applications, > G. Ottoni, B. Maher > > Kees Cook (2): > x86/boot: Allow a "silent" kaslr random byte fetch > x86/boot/compressed: Avoid duplicate malloc() implementations > > Kristen Carlson Accardi (8): > objtool: Do not assume order of parent/child functions > x86: tools/relocs: Support >64K section headers > x86: Makefile: Add build and config option for CONFIG_FG_KASLR > x86: Make sure _etext includes function sections > x86/tools: Add relative relocs for randomized functions > x86: Add support for function granular KASLR > kallsyms: Hide layout > module: Reorder functions > > .../admin-guide/kernel-parameters.txt | 7 + > Documentation/security/fgkaslr.rst | 172 ++++ > Documentation/security/index.rst | 1 + > Makefile | 6 +- > arch/x86/Kconfig | 4 + > arch/x86/Makefile | 5 + > arch/x86/boot/compressed/Makefile | 9 +- > arch/x86/boot/compressed/fgkaslr.c | 811 ++++++++++++++++++ > arch/x86/boot/compressed/kaslr.c | 4 - > arch/x86/boot/compressed/misc.c | 157 +++- > arch/x86/boot/compressed/misc.h | 30 + > arch/x86/boot/compressed/utils.c | 11 + > arch/x86/boot/compressed/vmlinux.symbols | 17 + > arch/x86/include/asm/boot.h | 15 +- > arch/x86/kernel/vmlinux.lds.S | 17 +- > arch/x86/lib/kaslr.c | 18 +- > arch/x86/tools/relocs.c | 143 ++- > arch/x86/tools/relocs.h | 4 +- > arch/x86/tools/relocs_common.c | 15 +- > include/asm-generic/vmlinux.lds.h | 18 +- > include/linux/decompress/mm.h | 12 +- > include/uapi/linux/elf.h | 1 + > init/Kconfig | 26 + > kernel/kallsyms.c | 163 +++- > kernel/module.c | 81 ++ > tools/objtool/elf.c | 8 +- > 26 files changed, 1670 insertions(+), 85 deletions(-) > create mode 100644 Documentation/security/fgkaslr.rst > create mode 100644 arch/x86/boot/compressed/fgkaslr.c > create mode 100644 arch/x86/boot/compressed/utils.c > create mode 100644 arch/x86/boot/compressed/vmlinux.symbols > > > base-commit: 11ba468877bb23f28956a35e896356252d63c983 > -- > 2.20.1 > Apologies in advance if this has already been discussed elsewhere, but I did finally get around to testing the patchset against the livepatching kselftests. The livepatching kselftests fail as all livepatches stall their transitions. It appears that reliable (ORC) stack unwinding is broken when fgkaslr is enabled. Relevant config options: CONFIG_ARCH_HAS_FG_KASLR=y CONFIG_ARCH_STACKWALK=y CONFIG_FG_KASLR=y CONFIG_HAVE_LIVEPATCH=y CONFIG_HAVE_RELIABLE_STACKTRACE=y CONFIG_LIVEPATCH=y CONFIG_MODULE_FG_KASLR=y CONFIG_TEST_LIVEPATCH=m CONFIG_UNWINDER_ORC=y The livepatch transitions are stuck along this call path: klp_check_stack stack_trace_save_tsk_reliable arch_stack_walk_reliable /* Check for stack corruption */ if (unwind_error(&state)) return -EINVAL; where the unwinder error is set by unwind_next_frame(): arch/x86/kernel/unwind_orc.c bool unwind_next_frame(struct unwind_state *state) sometimes here: /* End-of-stack check for kernel threads: */ if (orc->sp_reg == ORC_REG_UNDEFINED) { if (!orc->end) goto err; goto the_end; } or here: /* Prevent a recursive loop due to bad ORC data: */ if (state->stack_info.type == prev_type && on_stack(&state->stack_info, (void *)state->sp, sizeof(long)) && state->sp <= prev_sp) { orc_warn_current("stack going in the wrong direction? at %pB\n", (void *)orig_ip); goto err; } (and probably other places the ORC unwinder gets confused.) It also manifests itself in other, more visible ways. For example, a kernel module that calls dump_stack() in its init function or even /proc/<pid>/stack: (fgkaslr on) ------------ Call Trace: ? dump_stack+0x57/0x73 ? 0xffffffffc0850000 ? mymodule_init+0xa/0x1000 [dumpstack] ? do_one_initcall+0x46/0x1f0 ? free_unref_page_commit+0x91/0x100 ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x14b/0x210 ? do_init_module+0x5a/0x220 ? load_module+0x1912/0x1b20 ? __do_sys_finit_module+0xa8/0x110 ? __do_sys_finit_module+0xa8/0x110 ? do_syscall_64+0x47/0x80 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 % sudo cat /proc/$$/stack [<0>] do_wait+0x1c3/0x230 [<0>] kernel_wait4+0xa6/0x140 fgkaslr=off ----------- Call Trace: dump_stack+0x57/0x73 ? 0xffffffffc04f2000 mymodule_init+0xa/0x1000 [readonly] do_one_initcall+0x46/0x1f0 ? free_unref_page_commit+0x91/0x100 ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x14b/0x210 do_init_module+0x5a/0x220 load_module+0x1912/0x1b20 ? __do_sys_finit_module+0xa8/0x110 __do_sys_finit_module+0xa8/0x110 do_syscall_64+0x47/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 % sudo cat /proc/$$/stack [<0>] do_wait+0x1c3/0x230 [<0>] kernel_wait4+0xa6/0x140 [<0>] __do_sys_wait4+0x83/0x90 [<0>] do_syscall_64+0x47/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 I would think fixing and verifying these latter cases would be easier than chasing livepatch transitions (but would still probably fix klp case, too). Perhaps Josh or someone has other ORC unwinder tests that could be used? -- Joe
* Kristen Carlson Accardi <kristen@linux.intel.com> wrote: > Function Granular Kernel Address Space Layout Randomization (fgkaslr) > --------------------------------------------------------------------- > > This patch set is an implementation of finer grained kernel address space > randomization. It rearranges your kernel code at load time > on a per-function level granularity, with only around a second added to > boot time. This is a very nice feature IMO, and it should be far more effective at randomizing the kernel, due to the sheer number of randomization bits that kernel function granular randomization presents. If this is a good approximation of fg-kaslr randomization depth: thule:~/tip> grep ' [tT] ' /proc/kallsyms | wc -l 88488 ... then that's 80K bits of randomization instead of the mere handful of kaslr bits we have today. Very nice! > In order to hide our new layout, symbols reported through > /proc/kallsyms will be displayed in a random order. Neat. :-) > Performance Impact > ------------------ > * Run time > The performance impact at run-time of function reordering varies by workload. > Using kcbench, a kernel compilation benchmark, the performance of a kernel > build with finer grained KASLR was about 1% slower than a kernel with standard > KASLR. Analysis with perf showed a slightly higher percentage of > L1-icache-load-misses. Other workloads were examined as well, with varied > results. Some workloads performed significantly worse under FGKASLR, while > others stayed the same or were mysteriously better. In general, it will > depend on the code flow whether or not finer grained KASLR will impact > your workload, and how the underlying code was designed. Because the layout > changes per boot, each time a system is rebooted the performance of a workload > may change. I'd guess that the biggest performance impact comes from tearing apart 'groups' of functions that particular workloads are using. In that sense it might be worthwile to add a '__kaslr_group' function tag to key functions, which would keep certain performance critical functions next to each other. This shouldn't really be a problem, as even with generous amount of grouping the number of randomization bits is incredibly large. > Future work could identify hot areas that may not be randomized and either > leave them in the .text section or group them together into a single section > that may be randomized. If grouping things together helps, one other thing to > consider is that if we could identify text blobs that should be grouped together > to benefit a particular code flow, it could be interesting to explore > whether this security feature could be also be used as a performance > feature if you are interested in optimizing your kernel layout for a > particular workload at boot time. Optimizing function layout for a particular > workload has been researched and proven effective - for more information > read the Facebook paper "Optimizing Function Placement for Large-Scale > Data-Center Applications" (see references section below). I'm pretty sure the 'grouping' solution would address any real slowdowns. I'd also suggest allowing the passing in of a boot-time pseudo-random generator seed number, which would allow the creation of a pseudo-randomized but repeatable layout across reboots. > Image Size > ---------- > Adding additional section headers as a result of compiling with > -ffunction-sections will increase the size of the vmlinux ELF file. > With a standard distro config, the resulting vmlinux was increased by > about 3%. The compressed image is also increased due to the header files, > as well as the extra relocations that must be added. You can expect fgkaslr > to increase the size of the compressed image by about 15%. What is the increase of the resulting raw kernel image? Additional relocations might increase its size (unless I'm missing something) - it would be nice to measure this effect. I'd expect this to be really low. vmlinux or compressed kernel size doesn't really matter on x86-64, it's a boot time only expense well within typical system resource limits. > Disabling > --------- > Disabling normal KASLR using the nokaslr command line option also disables > fgkaslr. It is also possible to disable fgkaslr separately by booting with > fgkaslr=off on the commandline. I'd suggest to also add a 'nofgkaslr' boot option if it doesn't yet exist, to keep usage symmetric with kaslr. Likewise, there should probably be a 'kaslr=off' option as well. The less random our user interfaces are, the better ... > arch/x86/boot/compressed/Makefile | 9 +- > arch/x86/boot/compressed/fgkaslr.c | 811 ++++++++++++++++++ > arch/x86/boot/compressed/kaslr.c | 4 - > arch/x86/boot/compressed/misc.c | 157 +++- > arch/x86/boot/compressed/misc.h | 30 + > arch/x86/boot/compressed/utils.c | 11 + > arch/x86/boot/compressed/vmlinux.symbols | 17 + > arch/x86/include/asm/boot.h | 15 +- > arch/x86/kernel/vmlinux.lds.S | 17 +- > arch/x86/lib/kaslr.c | 18 +- > arch/x86/tools/relocs.c | 143 ++- > arch/x86/tools/relocs.h | 4 +- > arch/x86/tools/relocs_common.c | 15 +- > include/asm-generic/vmlinux.lds.h | 18 +- > include/linux/decompress/mm.h | 12 +- > include/uapi/linux/elf.h | 1 + > init/Kconfig | 26 + > kernel/kallsyms.c | 163 +++- > kernel/module.c | 81 ++ > tools/objtool/elf.c | 8 +- > 26 files changed, 1670 insertions(+), 85 deletions(-) > create mode 100644 Documentation/security/fgkaslr.rst > create mode 100644 arch/x86/boot/compressed/fgkaslr.c > create mode 100644 arch/x86/boot/compressed/utils.c > create mode 100644 arch/x86/boot/compressed/vmlinux.symbols This looks surprisingly lean overall. Thanks, Ingo
Hi Mingo, thanks for taking a look, I am glad you like the idea. Some replies below: On Thu, 2020-08-06 at 17:32 +0200, Ingo Molnar wrote: > * Kristen Carlson Accardi <kristen@linux.intel.com> wrote: > > > Function Granular Kernel Address Space Layout Randomization > > (fgkaslr) > > ----------------------------------------------------------------- > > ---- > > > > This patch set is an implementation of finer grained kernel address > > space > > randomization. It rearranges your kernel code at load time > > on a per-function level granularity, with only around a second > > added to > > boot time. > > This is a very nice feature IMO, and it should be far more effective > at randomizing the kernel, due to the sheer number of randomization > bits that kernel function granular randomization presents. > > If this is a good approximation of fg-kaslr randomization depth: > > thule:~/tip> grep ' [tT] ' /proc/kallsyms | wc -l > 88488 > > ... then that's 80K bits of randomization instead of the mere > handful > of kaslr bits we have today. Very nice! > > > In order to hide our new layout, symbols reported through > > /proc/kallsyms will be displayed in a random order. > > Neat. :-) > > > Performance Impact > > ------------------ > > * Run time > > The performance impact at run-time of function reordering varies by > > workload. > > Using kcbench, a kernel compilation benchmark, the performance of a > > kernel > > build with finer grained KASLR was about 1% slower than a kernel > > with standard > > KASLR. Analysis with perf showed a slightly higher percentage of > > L1-icache-load-misses. Other workloads were examined as well, with > > varied > > results. Some workloads performed significantly worse under > > FGKASLR, while > > others stayed the same or were mysteriously better. In general, it > > will > > depend on the code flow whether or not finer grained KASLR will > > impact > > your workload, and how the underlying code was designed. Because > > the layout > > changes per boot, each time a system is rebooted the performance of > > a workload > > may change. > > I'd guess that the biggest performance impact comes from tearing > apart > 'groups' of functions that particular workloads are using. > > In that sense it might be worthwile to add a '__kaslr_group' > function > tag to key functions, which would keep certain performance critical > functions next to each other. > > This shouldn't really be a problem, as even with generous amount of > grouping the number of randomization bits is incredibly large. So my strategy so far was to try to get a very basic non-performance optimized fgkaslr mode merged first, then add performance optimized options as a next step. For example, a user might pass in fgkaslr="group" to the fgkaslr kernel parameter to select a layout which groups some things by whatever criteria we want to mitigate some of the performance impact of full randomization, or they might chose fgkaslr="full", which just randomizes everything (the current implementation). If people think it's worth adding the performance optimizations for the initial merge, I can certainly work on those, but i thought it might be better to keep it super simple at first. > > > Future work could identify hot areas that may not be randomized and > > either > > leave them in the .text section or group them together into a > > single section > > that may be randomized. If grouping things together helps, one > > other thing to > > consider is that if we could identify text blobs that should be > > grouped together > > to benefit a particular code flow, it could be interesting to > > explore > > whether this security feature could be also be used as a > > performance > > feature if you are interested in optimizing your kernel layout for > > a > > particular workload at boot time. Optimizing function layout for a > > particular > > workload has been researched and proven effective - for more > > information > > read the Facebook paper "Optimizing Function Placement for Large- > > Scale > > Data-Center Applications" (see references section below). > > I'm pretty sure the 'grouping' solution would address any real > slowdowns. > > I'd also suggest allowing the passing in of a boot-time pseudo- > random > generator seed number, which would allow the creation of a > pseudo-randomized but repeatable layout across reboots. We talked during the RFC stage of porting the chacha20 code to this early boot stage to use as a prand generator. Ultimately, this means you now have a secret you have to protect (the seed), and so I've dropped this for now. I could see maybe having this as a debug option? I certainly use a prand myself even now when I'm still debugging functional issues (although the one I use for my own debugging isn't suitable for merging). > > > Image Size > > ---------- > > Adding additional section headers as a result of compiling with > > -ffunction-sections will increase the size of the vmlinux ELF file. > > With a standard distro config, the resulting vmlinux was increased > > by > > about 3%. The compressed image is also increased due to the header > > files, > > as well as the extra relocations that must be added. You can expect > > fgkaslr > > to increase the size of the compressed image by about 15%. > > What is the increase of the resulting raw kernel image? Additional > relocations might increase its size (unless I'm missing something) - > it would be nice to measure this effect. I'd expect this to be > really > low. By raw kernel image, do you mean just what eventually gets copied into memory after decompression minus the relocation table? If so, this is almost no difference - the only difference is that there is a little bit of change in the padding between sections vs what the non- randomized kernel is because of alignment differences with the new layout. so you wind up with a few extra bytes give or take. > > vmlinux or compressed kernel size doesn't really matter on x86-64, > it's a boot time only expense well within typical system resource > limits. > > > Disabling > > --------- > > Disabling normal KASLR using the nokaslr command line option also > > disables > > fgkaslr. It is also possible to disable fgkaslr separately by > > booting with > > fgkaslr=off on the commandline. > > I'd suggest to also add a 'nofgkaslr' boot option if it doesn't yet > exist, to keep usage symmetric with kaslr. > > Likewise, there should probably be a 'kaslr=off' option as well. > > The less random our user interfaces are, the better ... > > > arch/x86/boot/compressed/Makefile | 9 +- > > arch/x86/boot/compressed/fgkaslr.c | 811 > > ++++++++++++++++++ > > arch/x86/boot/compressed/kaslr.c | 4 - > > arch/x86/boot/compressed/misc.c | 157 +++- > > arch/x86/boot/compressed/misc.h | 30 + > > arch/x86/boot/compressed/utils.c | 11 + > > arch/x86/boot/compressed/vmlinux.symbols | 17 + > > arch/x86/include/asm/boot.h | 15 +- > > arch/x86/kernel/vmlinux.lds.S | 17 +- > > arch/x86/lib/kaslr.c | 18 +- > > arch/x86/tools/relocs.c | 143 ++- > > arch/x86/tools/relocs.h | 4 +- > > arch/x86/tools/relocs_common.c | 15 +- > > include/asm-generic/vmlinux.lds.h | 18 +- > > include/linux/decompress/mm.h | 12 +- > > include/uapi/linux/elf.h | 1 + > > init/Kconfig | 26 + > > kernel/kallsyms.c | 163 +++- > > kernel/module.c | 81 ++ > > tools/objtool/elf.c | 8 +- > > 26 files changed, 1670 insertions(+), 85 deletions(-) > > create mode 100644 Documentation/security/fgkaslr.rst > > create mode 100644 arch/x86/boot/compressed/fgkaslr.c > > create mode 100644 arch/x86/boot/compressed/utils.c > > create mode 100644 arch/x86/boot/compressed/vmlinux.symbols > > This looks surprisingly lean overall. Most of the changes outside of fgkaslr.c, module.c, and kallsyms.c were little tweaks here and there to accommodate using -ffunction-sections and handling >64K elf sections, otherwise yes, I tried to keep it very self contained and non-invasive.
On Thu, Aug 06, 2020 at 05:32:58PM +0200, Ingo Molnar wrote: > * Kristen Carlson Accardi <kristen@linux.intel.com> wrote: > > [...] > > Performance Impact > > ------------------ > > > * Run time > > The performance impact at run-time of function reordering varies by workload. > > Using kcbench, a kernel compilation benchmark, the performance of a kernel > > build with finer grained KASLR was about 1% slower than a kernel with standard > > KASLR. Analysis with perf showed a slightly higher percentage of > > L1-icache-load-misses. Other workloads were examined as well, with varied > > results. Some workloads performed significantly worse under FGKASLR, while > > others stayed the same or were mysteriously better. In general, it will > > depend on the code flow whether or not finer grained KASLR will impact > > your workload, and how the underlying code was designed. Because the layout > > changes per boot, each time a system is rebooted the performance of a workload > > may change. > > I'd guess that the biggest performance impact comes from tearing apart > 'groups' of functions that particular workloads are using. > > In that sense it might be worthwile to add a '__kaslr_group' function > tag to key functions, which would keep certain performance critical > functions next to each other. We kind of already do this manually for things like the scheduler, etc, using macros like ".whatever.text", so we might be able to create a more generalized approach for those. Right now they require a "section" macro usage and a linker script __start* and __end* marker, etc: #define SCHED_TEXT \ ALIGN_FUNCTION(); \ __sched_text_start = .; \ *(.sched.text) \ __sched_text_end = .; Manually collected each whatever_TEXT define and building out each __whatever_start, etc is annoying. It'd be really cool to have linker script have wildcard replacements for build a syntax like this, based on the presences of matching input sections: .%.text : { __%_start = .; *(.%.text.hot) *(.%.text) *(.%.text.*) *(.%.text.unlikely) __%_end = .; } > I'd also suggest allowing the passing in of a boot-time pseudo-random > generator seed number, which would allow the creation of a > pseudo-randomized but repeatable layout across reboots. This was present in earlier versions of the series.
On Tue, 2020-08-04 at 14:23 -0400, Joe Lawrence wrote: > On Fri, Jul 17, 2020 at 09:59:57AM -0700, Kristen Carlson Accardi > wrote: > > Function Granular Kernel Address Space Layout Randomization > > (fgkaslr) > > ----------------------------------------------------------------- > > ---- > > > > This patch set is an implementation of finer grained kernel address > > space > > randomization. It rearranges your kernel code at load time > > on a per-function level granularity, with only around a second > > added to > > boot time. > > > > Changes in v4: > > ------------- > > * dropped the patch to split out change to STATIC definition in > > x86/boot/compressed/misc.c and replaced with a patch authored > > by Kees Cook to avoid the duplicate malloc definitions > > * Added a section to Documentation/admin-guide/kernel- > > parameters.txt > > to document the fgkaslr boot option. > > * redesigned the patch to hide the new layout when reading > > /proc/kallsyms. The previous implementation utilized a > > dynamically > > allocated linked list to display the kernel and module symbols > > in alphabetical order. The new implementation uses a randomly > > shuffled index array to display the kernel and module symbols > > in a random order. > > > > Changes in v3: > > ------------- > > * Makefile changes to accommodate > > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION > > * removal of extraneous ALIGN_PAGE from _etext changes > > * changed variable names in x86/tools/relocs to be less confusing > > * split out change to STATIC definition in > > x86/boot/compressed/misc.c > > * Updates to Documentation to make it more clear what is preserved > > in .text > > * much more detailed commit message for function granular KASLR > > patch > > * minor tweaks and changes that make for more readable code > > * this cover letter updated slightly to add additional details > > > > Changes in v2: > > -------------- > > * Fix to address i386 build failure > > * Allow module reordering patch to be configured separately so that > > arm (or other non-x86_64 arches) can take advantage of module > > function > > reordering. This support has not be tested by me, but smoke > > tested by > > Ard Biesheuvel <ardb@kernel.org> on arm. > > * Fix build issue when building on arm as reported by > > Ard Biesheuvel <ardb@kernel.org> > > > > Patches to objtool are included because they are dependencies for > > this > > patchset, however they have been submitted by their maintainer > > separately. > > > > Background > > ---------- > > KASLR was merged into the kernel with the objective of increasing > > the > > difficulty of code reuse attacks. Code reuse attacks reused > > existing code > > snippets to get around existing memory protections. They exploit > > software bugs > > which expose addresses of useful code snippets to control the flow > > of > > execution for their own nefarious purposes. KASLR moves the entire > > kernel > > code text as a unit at boot time in order to make addresses less > > predictable. > > The order of the code within the segment is unchanged - only the > > base address > > is shifted. There are a few shortcomings to this algorithm. > > > > 1. Low Entropy - there are only so many locations the kernel can > > fit in. This > > means an attacker could guess without too much trouble. > > 2. Knowledge of a single address can reveal the offset of the base > > address, > > exposing all other locations for a published/known kernel image. > > 3. Info leaks abound. > > > > Finer grained ASLR has been proposed as a way to make ASLR more > > resistant > > to info leaks. It is not a new concept at all, and there are many > > variations > > possible. Function reordering is an implementation of finer grained > > ASLR > > which randomizes the layout of an address space on a function level > > granularity. We use the term "fgkaslr" in this document to refer to > > the > > technique of function reordering when used with KASLR, as well as > > finer grained > > KASLR in general. > > > > Proposed Improvement > > -------------------- > > This patch set proposes adding function reordering on top of the > > existing > > KASLR base address randomization. The over-arching objective is > > incremental > > improvement over what we already have. It is designed to work in > > combination > > with the existing solution. The implementation is really pretty > > simple, and > > there are 2 main area where changes occur: > > > > * Build time > > > > GCC has had an option to place functions into individual .text > > sections for > > many years now. This option can be used to implement function > > reordering at > > load time. The final compiled vmlinux retains all the section > > headers, which > > can be used to help find the address ranges of each function. Using > > this > > information and an expanded table of relocation addresses, > > individual text > > sections can be suffled immediately after decompression. Some data > > tables > > inside the kernel that have assumptions about order require re- > > sorting > > after being updated when applying relocations. In order to modify > > these tables, > > a few key symbols are excluded from the objcopy symbol stripping > > process for > > use after shuffling the text segments. > > > > Some highlights from the build time changes to look for: > > > > The top level kernel Makefile was modified to add the gcc flag if > > it > > is supported. Currently, I am applying this flag to everything it > > is > > possible to randomize. Anything that is written in C and not > > present in a > > special input section is randomized. The final binary segment 0 > > retains a > > consolidated .text section, as well as all the individual .text.* > > sections. > > Future work could turn off this flags for selected files or even > > entire > > subsystems, although obviously at the cost of security. > > > > The relocs tool is updated to add relative relocations. This > > information > > previously wasn't included because it wasn't necessary when moving > > the > > entire .text segment as a unit. > > > > A new file was created to contain a list of symbols that objcopy > > should > > keep. We use those symbols at load time as described below. > > > > * Load time > > > > The boot kernel was modified to parse the vmlinux elf file after > > decompression to check for our interesting symbols that we kept, > > and to > > look for any .text.* sections to randomize. The consolidated .text > > section > > is skipped and not moved. The sections are shuffled randomly, and > > copied > > into memory following the .text section in a new random order. The > > existing > > code which updated relocation addresses was modified to account for > > not just a fixed delta from the load address, but the offset that > > the function > > section was moved to. This requires inspection of each address to > > see if > > it was impacted by a randomization. We use a bsearch to make this > > less > > horrible on performance. Any tables that need to be modified with > > new > > addresses or resorted are updated using the symbol addresses parsed > > from the > > elf symbol table. > > > > In order to hide our new layout, symbols reported through > > /proc/kallsyms > > will be displayed in a random order. > > > > Security Considerations > > ----------------------- > > The objective of this patch set is to improve a technology that is > > already > > merged into the kernel (KASLR). This code will not prevent all > > attacks, > > but should instead be considered as one of several tools that can > > be used. > > In particular, this code is meant to make KASLR more effective in > > the presence > > of info leaks. > > > > How much entropy we are adding to the existing entropy of standard > > KASLR will > > depend on a few variables. Firstly and most obviously, the number > > of functions > > that are randomized matters. This implementation keeps the existing > > .text > > section for code that cannot be randomized - for example, because > > it was > > assembly code. The less sections to randomize, the less entropy. In > > addition, > > due to alignment (16 bytes for x86_64), the number of bits in a > > address that > > the attacker needs to guess is reduced, as the lower bits are > > identical. > > > > Performance Impact > > ------------------ > > There are two areas where function reordering can impact > > performance: boot > > time latency, and run time performance. > > > > * Boot time latency > > This implementation of finer grained KASLR impacts the boot time of > > the kernel > > in several places. It requires additional parsing of the kernel ELF > > file to > > obtain the section headers of the sections to be randomized. It > > calls the > > random number generator for each section to be randomized to > > determine that > > section's new memory location. It copies the decompressed kernel > > into a new > > area of memory to avoid corruption when laying out the newly > > randomized > > sections. It increases the number of relocations the kernel has to > > perform at > > boot time vs. standard KASLR, and it also requires a lookup on each > > address > > that needs to be relocated to see if it was in a randomized section > > and needs > > to be adjusted by a new offset. Finally, it re-sorts a few data > > tables that > > are required to be sorted by address. > > > > Booting a test VM on a modern, well appointed system showed an > > increase in > > latency of approximately 1 second. > > > > * Run time > > The performance impact at run-time of function reordering varies by > > workload. > > Using kcbench, a kernel compilation benchmark, the performance of a > > kernel > > build with finer grained KASLR was about 1% slower than a kernel > > with standard > > KASLR. Analysis with perf showed a slightly higher percentage of > > L1-icache-load-misses. Other workloads were examined as well, with > > varied > > results. Some workloads performed significantly worse under > > FGKASLR, while > > others stayed the same or were mysteriously better. In general, it > > will > > depend on the code flow whether or not finer grained KASLR will > > impact > > your workload, and how the underlying code was designed. Because > > the layout > > changes per boot, each time a system is rebooted the performance of > > a workload > > may change. > > > > Future work could identify hot areas that may not be randomized and > > either > > leave them in the .text section or group them together into a > > single section > > that may be randomized. If grouping things together helps, one > > other thing to > > consider is that if we could identify text blobs that should be > > grouped together > > to benefit a particular code flow, it could be interesting to > > explore > > whether this security feature could be also be used as a > > performance > > feature if you are interested in optimizing your kernel layout for > > a > > particular workload at boot time. Optimizing function layout for a > > particular > > workload has been researched and proven effective - for more > > information > > read the Facebook paper "Optimizing Function Placement for Large- > > Scale > > Data-Center Applications" (see references section below). > > > > Image Size > > ---------- > > Adding additional section headers as a result of compiling with > > -ffunction-sections will increase the size of the vmlinux ELF file. > > With a standard distro config, the resulting vmlinux was increased > > by > > about 3%. The compressed image is also increased due to the header > > files, > > as well as the extra relocations that must be added. You can expect > > fgkaslr > > to increase the size of the compressed image by about 15%. > > > > Memory Usage > > ------------ > > fgkaslr increases the amount of heap that is required at boot time, > > although this extra memory is released when the kernel has finished > > decompression. As a result, it may not be appropriate to use this > > feature on > > systems without much memory. > > > > Building > > -------- > > To enable fine grained KASLR, you need to have the following config > > options > > set (including all the ones you would use to build normal KASLR) > > > > CONFIG_FG_KASLR=y > > > > In addition, fgkaslr is only supported for the X86_64 architecture. > > > > Modules > > ------- > > Modules are randomized similarly to the rest of the kernel by > > shuffling > > the sections at load time prior to moving them into memory. The > > module must > > also have been build with the -ffunction-sections compiler option. > > > > Although fgkaslr for the kernel is only supported for the X86_64 > > architecture, > > it is possible to use fgkaslr with modules on other architectures. > > To enable > > this feature, select > > > > CONFIG_MODULE_FG_KASLR=y > > > > This option is selected automatically for X86_64 when > > CONFIG_FG_KASLR is set. > > > > Disabling > > --------- > > Disabling normal KASLR using the nokaslr command line option also > > disables > > fgkaslr. It is also possible to disable fgkaslr separately by > > booting with > > fgkaslr=off on the commandline. > > > > References > > ---------- > > There are a lot of academic papers which explore finer grained > > ASLR. > > This paper in particular contributed the most to my implementation > > design > > as well as my overall understanding of the problem space: > > > > Selfrando: Securing the Tor Browser against De-anonymization > > Exploits, > > M. Conti, S. Crane, T. Frassetto, et al. > > > > For more information on how function layout impacts performance, > > see: > > > > Optimizing Function Placement for Large-Scale Data-Center > > Applications, > > G. Ottoni, B. Maher > > > > Kees Cook (2): > > x86/boot: Allow a "silent" kaslr random byte fetch > > x86/boot/compressed: Avoid duplicate malloc() implementations > > > > Kristen Carlson Accardi (8): > > objtool: Do not assume order of parent/child functions > > x86: tools/relocs: Support >64K section headers > > x86: Makefile: Add build and config option for CONFIG_FG_KASLR > > x86: Make sure _etext includes function sections > > x86/tools: Add relative relocs for randomized functions > > x86: Add support for function granular KASLR > > kallsyms: Hide layout > > module: Reorder functions > > > > .../admin-guide/kernel-parameters.txt | 7 + > > Documentation/security/fgkaslr.rst | 172 ++++ > > Documentation/security/index.rst | 1 + > > Makefile | 6 +- > > arch/x86/Kconfig | 4 + > > arch/x86/Makefile | 5 + > > arch/x86/boot/compressed/Makefile | 9 +- > > arch/x86/boot/compressed/fgkaslr.c | 811 > > ++++++++++++++++++ > > arch/x86/boot/compressed/kaslr.c | 4 - > > arch/x86/boot/compressed/misc.c | 157 +++- > > arch/x86/boot/compressed/misc.h | 30 + > > arch/x86/boot/compressed/utils.c | 11 + > > arch/x86/boot/compressed/vmlinux.symbols | 17 + > > arch/x86/include/asm/boot.h | 15 +- > > arch/x86/kernel/vmlinux.lds.S | 17 +- > > arch/x86/lib/kaslr.c | 18 +- > > arch/x86/tools/relocs.c | 143 ++- > > arch/x86/tools/relocs.h | 4 +- > > arch/x86/tools/relocs_common.c | 15 +- > > include/asm-generic/vmlinux.lds.h | 18 +- > > include/linux/decompress/mm.h | 12 +- > > include/uapi/linux/elf.h | 1 + > > init/Kconfig | 26 + > > kernel/kallsyms.c | 163 +++- > > kernel/module.c | 81 ++ > > tools/objtool/elf.c | 8 +- > > 26 files changed, 1670 insertions(+), 85 deletions(-) > > create mode 100644 Documentation/security/fgkaslr.rst > > create mode 100644 arch/x86/boot/compressed/fgkaslr.c > > create mode 100644 arch/x86/boot/compressed/utils.c > > create mode 100644 arch/x86/boot/compressed/vmlinux.symbols > > > > > > base-commit: 11ba468877bb23f28956a35e896356252d63c983 > > -- > > 2.20.1 > > > > Apologies in advance if this has already been discussed elsewhere, > but I > did finally get around to testing the patchset against the > livepatching > kselftests. > > The livepatching kselftests fail as all livepatches stall their > transitions. It appears that reliable (ORC) stack unwinding is > broken > when fgkaslr is enabled. > > Relevant config options: > > CONFIG_ARCH_HAS_FG_KASLR=y > CONFIG_ARCH_STACKWALK=y > CONFIG_FG_KASLR=y > CONFIG_HAVE_LIVEPATCH=y > CONFIG_HAVE_RELIABLE_STACKTRACE=y > CONFIG_LIVEPATCH=y > CONFIG_MODULE_FG_KASLR=y > CONFIG_TEST_LIVEPATCH=m > CONFIG_UNWINDER_ORC=y > > The livepatch transitions are stuck along this call path: > > klp_check_stack > stack_trace_save_tsk_reliable > arch_stack_walk_reliable > > /* Check for stack corruption */ > if (unwind_error(&state)) > return -EINVAL; > > where the unwinder error is set by unwind_next_frame(): > > arch/x86/kernel/unwind_orc.c > bool unwind_next_frame(struct unwind_state *state) > > sometimes here: > > /* End-of-stack check for kernel threads: */ > if (orc->sp_reg == ORC_REG_UNDEFINED) { > if (!orc->end) > goto err; > > goto the_end; > } > > or here: > > /* Prevent a recursive loop due to bad ORC data: > */ > > if (state->stack_info.type == prev_type > && > > on_stack(&state->stack_info, (void *)state->sp, > sizeof(long)) > && > state->sp <= prev_sp) > { > > orc_warn_current("stack going in the wrong direction? > at %pB\n", > (void > *)orig_ip); > > goto > err; > > } > > (and probably other places the ORC unwinder gets confused.) > > > It also manifests itself in other, more visible ways. For example, a > kernel module that calls dump_stack() in its init function or even > /proc/<pid>/stack: > > (fgkaslr on) > ------------ > > Call Trace: > ? dump_stack+0x57/0x73 > ? 0xffffffffc0850000 > ? mymodule_init+0xa/0x1000 [dumpstack] > ? do_one_initcall+0x46/0x1f0 > ? free_unref_page_commit+0x91/0x100 > ? _cond_resched+0x15/0x30 > ? kmem_cache_alloc_trace+0x14b/0x210 > ? do_init_module+0x5a/0x220 > ? load_module+0x1912/0x1b20 > ? __do_sys_finit_module+0xa8/0x110 > ? __do_sys_finit_module+0xa8/0x110 > ? do_syscall_64+0x47/0x80 > ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > % sudo cat /proc/$$/stack > [<0>] do_wait+0x1c3/0x230 > [<0>] kernel_wait4+0xa6/0x140 > > > fgkaslr=off > ----------- > > Call Trace: > dump_stack+0x57/0x73 > ? 0xffffffffc04f2000 > mymodule_init+0xa/0x1000 [readonly] > do_one_initcall+0x46/0x1f0 > ? free_unref_page_commit+0x91/0x100 > ? _cond_resched+0x15/0x30 > ? kmem_cache_alloc_trace+0x14b/0x210 > do_init_module+0x5a/0x220 > load_module+0x1912/0x1b20 > ? __do_sys_finit_module+0xa8/0x110 > __do_sys_finit_module+0xa8/0x110 > do_syscall_64+0x47/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > % sudo cat /proc/$$/stack > [<0>] do_wait+0x1c3/0x230 > [<0>] kernel_wait4+0xa6/0x140 > [<0>] __do_sys_wait4+0x83/0x90 > [<0>] do_syscall_64+0x47/0x80 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > I would think fixing and verifying these latter cases would be easier > than > chasing livepatch transitions (but would still probably fix klp case, > too). > Perhaps Josh or someone has other ORC unwinder tests that could be > used? > > -- Joe > Hi Joe, Thanks for testing. Yes, Josh and I have been discussing the orc_unwind issues. I've root caused one issue already, in that objtool places an orc_unwind_ip address just outside the section, so my algorithm fails to relocate this address. There are other issues as well that I still haven't root caused. I'll be addressing this in v5 and plan to have something that passes livepatch testing with that version. Kristen
On Fri, Aug 07, 2020 at 09:38:11AM -0700, Kristen Carlson Accardi wrote: > Thanks for testing. Yes, Josh and I have been discussing the orc_unwind > issues. I've root caused one issue already, in that objtool places an > orc_unwind_ip address just outside the section, so my algorithm fails > to relocate this address. There are other issues as well that I still > haven't root caused. I'll be addressing this in v5 and plan to have > something that passes livepatch testing with that version. FWIW, I'm okay with seeing fgkaslr be developed progressively. Getting it working with !livepatching would be fine as a first step. There's value in getting the general behavior landed, and then continuing to improve it.
On Fri, 2020-08-07 at 10:20 -0700, Kees Cook wrote: > On Fri, Aug 07, 2020 at 09:38:11AM -0700, Kristen Carlson Accardi > wrote: > > Thanks for testing. Yes, Josh and I have been discussing the > > orc_unwind > > issues. I've root caused one issue already, in that objtool places > > an > > orc_unwind_ip address just outside the section, so my algorithm > > fails > > to relocate this address. There are other issues as well that I > > still > > haven't root caused. I'll be addressing this in v5 and plan to have > > something that passes livepatch testing with that version. > > FWIW, I'm okay with seeing fgkaslr be developed progressively. > Getting > it working with !livepatching would be fine as a first step. There's > value in getting the general behavior landed, and then continuing to > improve it. > In this case, part of the issue with livepatching appears to be a more general issue with objtool and how it creates the orc unwind entries when you have >64K sections. So livepatching is a good test case for making sure that the orc tables are actually correct. However, the other issue with livepatching (the duplicate symbols), might be worth deferring if the solution is complex - I will keep that in mind as I look at it more closely.
On Tue, 2020-08-04 at 14:23 -0400, Joe Lawrence wrote: > On Fri, Jul 17, 2020 at 09:59:57AM -0700, Kristen Carlson Accardi > wrote: > > Function Granular Kernel Address Space Layout Randomization > > (fgkaslr) > > ----------------------------------------------------------------- > > ---- > > > > This patch set is an implementation of finer grained kernel address > > space > > randomization. It rearranges your kernel code at load time > > on a per-function level granularity, with only around a second > > added to > > boot time. > > > > Changes in v4: > > ------------- > > * dropped the patch to split out change to STATIC definition in > > x86/boot/compressed/misc.c and replaced with a patch authored > > by Kees Cook to avoid the duplicate malloc definitions > > * Added a section to Documentation/admin-guide/kernel- > > parameters.txt > > to document the fgkaslr boot option. > > * redesigned the patch to hide the new layout when reading > > /proc/kallsyms. The previous implementation utilized a > > dynamically > > allocated linked list to display the kernel and module symbols > > in alphabetical order. The new implementation uses a randomly > > shuffled index array to display the kernel and module symbols > > in a random order. > > > > Changes in v3: > > ------------- > > * Makefile changes to accommodate > > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION > > * removal of extraneous ALIGN_PAGE from _etext changes > > * changed variable names in x86/tools/relocs to be less confusing > > * split out change to STATIC definition in > > x86/boot/compressed/misc.c > > * Updates to Documentation to make it more clear what is preserved > > in .text > > * much more detailed commit message for function granular KASLR > > patch > > * minor tweaks and changes that make for more readable code > > * this cover letter updated slightly to add additional details > > > > Changes in v2: > > -------------- > > * Fix to address i386 build failure > > * Allow module reordering patch to be configured separately so that > > arm (or other non-x86_64 arches) can take advantage of module > > function > > reordering. This support has not be tested by me, but smoke > > tested by > > Ard Biesheuvel <ardb@kernel.org> on arm. > > * Fix build issue when building on arm as reported by > > Ard Biesheuvel <ardb@kernel.org> > > > > Patches to objtool are included because they are dependencies for > > this > > patchset, however they have been submitted by their maintainer > > separately. > > > > Background > > ---------- > > KASLR was merged into the kernel with the objective of increasing > > the > > difficulty of code reuse attacks. Code reuse attacks reused > > existing code > > snippets to get around existing memory protections. They exploit > > software bugs > > which expose addresses of useful code snippets to control the flow > > of > > execution for their own nefarious purposes. KASLR moves the entire > > kernel > > code text as a unit at boot time in order to make addresses less > > predictable. > > The order of the code within the segment is unchanged - only the > > base address > > is shifted. There are a few shortcomings to this algorithm. > > > > 1. Low Entropy - there are only so many locations the kernel can > > fit in. This > > means an attacker could guess without too much trouble. > > 2. Knowledge of a single address can reveal the offset of the base > > address, > > exposing all other locations for a published/known kernel image. > > 3. Info leaks abound. > > > > Finer grained ASLR has been proposed as a way to make ASLR more > > resistant > > to info leaks. It is not a new concept at all, and there are many > > variations > > possible. Function reordering is an implementation of finer grained > > ASLR > > which randomizes the layout of an address space on a function level > > granularity. We use the term "fgkaslr" in this document to refer to > > the > > technique of function reordering when used with KASLR, as well as > > finer grained > > KASLR in general. > > > > Proposed Improvement > > -------------------- > > This patch set proposes adding function reordering on top of the > > existing > > KASLR base address randomization. The over-arching objective is > > incremental > > improvement over what we already have. It is designed to work in > > combination > > with the existing solution. The implementation is really pretty > > simple, and > > there are 2 main area where changes occur: > > > > * Build time > > > > GCC has had an option to place functions into individual .text > > sections for > > many years now. This option can be used to implement function > > reordering at > > load time. The final compiled vmlinux retains all the section > > headers, which > > can be used to help find the address ranges of each function. Using > > this > > information and an expanded table of relocation addresses, > > individual text > > sections can be suffled immediately after decompression. Some data > > tables > > inside the kernel that have assumptions about order require re- > > sorting > > after being updated when applying relocations. In order to modify > > these tables, > > a few key symbols are excluded from the objcopy symbol stripping > > process for > > use after shuffling the text segments. > > > > Some highlights from the build time changes to look for: > > > > The top level kernel Makefile was modified to add the gcc flag if > > it > > is supported. Currently, I am applying this flag to everything it > > is > > possible to randomize. Anything that is written in C and not > > present in a > > special input section is randomized. The final binary segment 0 > > retains a > > consolidated .text section, as well as all the individual .text.* > > sections. > > Future work could turn off this flags for selected files or even > > entire > > subsystems, although obviously at the cost of security. > > > > The relocs tool is updated to add relative relocations. This > > information > > previously wasn't included because it wasn't necessary when moving > > the > > entire .text segment as a unit. > > > > A new file was created to contain a list of symbols that objcopy > > should > > keep. We use those symbols at load time as described below. > > > > * Load time > > > > The boot kernel was modified to parse the vmlinux elf file after > > decompression to check for our interesting symbols that we kept, > > and to > > look for any .text.* sections to randomize. The consolidated .text > > section > > is skipped and not moved. The sections are shuffled randomly, and > > copied > > into memory following the .text section in a new random order. The > > existing > > code which updated relocation addresses was modified to account for > > not just a fixed delta from the load address, but the offset that > > the function > > section was moved to. This requires inspection of each address to > > see if > > it was impacted by a randomization. We use a bsearch to make this > > less > > horrible on performance. Any tables that need to be modified with > > new > > addresses or resorted are updated using the symbol addresses parsed > > from the > > elf symbol table. > > > > In order to hide our new layout, symbols reported through > > /proc/kallsyms > > will be displayed in a random order. > > > > Security Considerations > > ----------------------- > > The objective of this patch set is to improve a technology that is > > already > > merged into the kernel (KASLR). This code will not prevent all > > attacks, > > but should instead be considered as one of several tools that can > > be used. > > In particular, this code is meant to make KASLR more effective in > > the presence > > of info leaks. > > > > How much entropy we are adding to the existing entropy of standard > > KASLR will > > depend on a few variables. Firstly and most obviously, the number > > of functions > > that are randomized matters. This implementation keeps the existing > > .text > > section for code that cannot be randomized - for example, because > > it was > > assembly code. The less sections to randomize, the less entropy. In > > addition, > > due to alignment (16 bytes for x86_64), the number of bits in a > > address that > > the attacker needs to guess is reduced, as the lower bits are > > identical. > > > > Performance Impact > > ------------------ > > There are two areas where function reordering can impact > > performance: boot > > time latency, and run time performance. > > > > * Boot time latency > > This implementation of finer grained KASLR impacts the boot time of > > the kernel > > in several places. It requires additional parsing of the kernel ELF > > file to > > obtain the section headers of the sections to be randomized. It > > calls the > > random number generator for each section to be randomized to > > determine that > > section's new memory location. It copies the decompressed kernel > > into a new > > area of memory to avoid corruption when laying out the newly > > randomized > > sections. It increases the number of relocations the kernel has to > > perform at > > boot time vs. standard KASLR, and it also requires a lookup on each > > address > > that needs to be relocated to see if it was in a randomized section > > and needs > > to be adjusted by a new offset. Finally, it re-sorts a few data > > tables that > > are required to be sorted by address. > > > > Booting a test VM on a modern, well appointed system showed an > > increase in > > latency of approximately 1 second. > > > > * Run time > > The performance impact at run-time of function reordering varies by > > workload. > > Using kcbench, a kernel compilation benchmark, the performance of a > > kernel > > build with finer grained KASLR was about 1% slower than a kernel > > with standard > > KASLR. Analysis with perf showed a slightly higher percentage of > > L1-icache-load-misses. Other workloads were examined as well, with > > varied > > results. Some workloads performed significantly worse under > > FGKASLR, while > > others stayed the same or were mysteriously better. In general, it > > will > > depend on the code flow whether or not finer grained KASLR will > > impact > > your workload, and how the underlying code was designed. Because > > the layout > > changes per boot, each time a system is rebooted the performance of > > a workload > > may change. > > > > Future work could identify hot areas that may not be randomized and > > either > > leave them in the .text section or group them together into a > > single section > > that may be randomized. If grouping things together helps, one > > other thing to > > consider is that if we could identify text blobs that should be > > grouped together > > to benefit a particular code flow, it could be interesting to > > explore > > whether this security feature could be also be used as a > > performance > > feature if you are interested in optimizing your kernel layout for > > a > > particular workload at boot time. Optimizing function layout for a > > particular > > workload has been researched and proven effective - for more > > information > > read the Facebook paper "Optimizing Function Placement for Large- > > Scale > > Data-Center Applications" (see references section below). > > > > Image Size > > ---------- > > Adding additional section headers as a result of compiling with > > -ffunction-sections will increase the size of the vmlinux ELF file. > > With a standard distro config, the resulting vmlinux was increased > > by > > about 3%. The compressed image is also increased due to the header > > files, > > as well as the extra relocations that must be added. You can expect > > fgkaslr > > to increase the size of the compressed image by about 15%. > > > > Memory Usage > > ------------ > > fgkaslr increases the amount of heap that is required at boot time, > > although this extra memory is released when the kernel has finished > > decompression. As a result, it may not be appropriate to use this > > feature on > > systems without much memory. > > > > Building > > -------- > > To enable fine grained KASLR, you need to have the following config > > options > > set (including all the ones you would use to build normal KASLR) > > > > CONFIG_FG_KASLR=y > > > > In addition, fgkaslr is only supported for the X86_64 architecture. > > > > Modules > > ------- > > Modules are randomized similarly to the rest of the kernel by > > shuffling > > the sections at load time prior to moving them into memory. The > > module must > > also have been build with the -ffunction-sections compiler option. > > > > Although fgkaslr for the kernel is only supported for the X86_64 > > architecture, > > it is possible to use fgkaslr with modules on other architectures. > > To enable > > this feature, select > > > > CONFIG_MODULE_FG_KASLR=y > > > > This option is selected automatically for X86_64 when > > CONFIG_FG_KASLR is set. > > > > Disabling > > --------- > > Disabling normal KASLR using the nokaslr command line option also > > disables > > fgkaslr. It is also possible to disable fgkaslr separately by > > booting with > > fgkaslr=off on the commandline. > > > > References > > ---------- > > There are a lot of academic papers which explore finer grained > > ASLR. > > This paper in particular contributed the most to my implementation > > design > > as well as my overall understanding of the problem space: > > > > Selfrando: Securing the Tor Browser against De-anonymization > > Exploits, > > M. Conti, S. Crane, T. Frassetto, et al. > > > > For more information on how function layout impacts performance, > > see: > > > > Optimizing Function Placement for Large-Scale Data-Center > > Applications, > > G. Ottoni, B. Maher > > > > Kees Cook (2): > > x86/boot: Allow a "silent" kaslr random byte fetch > > x86/boot/compressed: Avoid duplicate malloc() implementations > > > > Kristen Carlson Accardi (8): > > objtool: Do not assume order of parent/child functions > > x86: tools/relocs: Support >64K section headers > > x86: Makefile: Add build and config option for CONFIG_FG_KASLR > > x86: Make sure _etext includes function sections > > x86/tools: Add relative relocs for randomized functions > > x86: Add support for function granular KASLR > > kallsyms: Hide layout > > module: Reorder functions > > > > .../admin-guide/kernel-parameters.txt | 7 + > > Documentation/security/fgkaslr.rst | 172 ++++ > > Documentation/security/index.rst | 1 + > > Makefile | 6 +- > > arch/x86/Kconfig | 4 + > > arch/x86/Makefile | 5 + > > arch/x86/boot/compressed/Makefile | 9 +- > > arch/x86/boot/compressed/fgkaslr.c | 811 > > ++++++++++++++++++ > > arch/x86/boot/compressed/kaslr.c | 4 - > > arch/x86/boot/compressed/misc.c | 157 +++- > > arch/x86/boot/compressed/misc.h | 30 + > > arch/x86/boot/compressed/utils.c | 11 + > > arch/x86/boot/compressed/vmlinux.symbols | 17 + > > arch/x86/include/asm/boot.h | 15 +- > > arch/x86/kernel/vmlinux.lds.S | 17 +- > > arch/x86/lib/kaslr.c | 18 +- > > arch/x86/tools/relocs.c | 143 ++- > > arch/x86/tools/relocs.h | 4 +- > > arch/x86/tools/relocs_common.c | 15 +- > > include/asm-generic/vmlinux.lds.h | 18 +- > > include/linux/decompress/mm.h | 12 +- > > include/uapi/linux/elf.h | 1 + > > init/Kconfig | 26 + > > kernel/kallsyms.c | 163 +++- > > kernel/module.c | 81 ++ > > tools/objtool/elf.c | 8 +- > > 26 files changed, 1670 insertions(+), 85 deletions(-) > > create mode 100644 Documentation/security/fgkaslr.rst > > create mode 100644 arch/x86/boot/compressed/fgkaslr.c > > create mode 100644 arch/x86/boot/compressed/utils.c > > create mode 100644 arch/x86/boot/compressed/vmlinux.symbols > > > > > > base-commit: 11ba468877bb23f28956a35e896356252d63c983 > > -- > > 2.20.1 > > > > Apologies in advance if this has already been discussed elsewhere, > but I > did finally get around to testing the patchset against the > livepatching > kselftests. > > The livepatching kselftests fail as all livepatches stall their > transitions. It appears that reliable (ORC) stack unwinding is > broken > when fgkaslr is enabled. > > Relevant config options: > > CONFIG_ARCH_HAS_FG_KASLR=y > CONFIG_ARCH_STACKWALK=y > CONFIG_FG_KASLR=y > CONFIG_HAVE_LIVEPATCH=y > CONFIG_HAVE_RELIABLE_STACKTRACE=y > CONFIG_LIVEPATCH=y > CONFIG_MODULE_FG_KASLR=y > CONFIG_TEST_LIVEPATCH=m > CONFIG_UNWINDER_ORC=y > > The livepatch transitions are stuck along this call path: > > klp_check_stack > stack_trace_save_tsk_reliable > arch_stack_walk_reliable > > /* Check for stack corruption */ > if (unwind_error(&state)) > return -EINVAL; > > where the unwinder error is set by unwind_next_frame(): > > arch/x86/kernel/unwind_orc.c > bool unwind_next_frame(struct unwind_state *state) > > sometimes here: > > /* End-of-stack check for kernel threads: */ > if (orc->sp_reg == ORC_REG_UNDEFINED) { > if (!orc->end) > goto err; > > goto the_end; > } > > or here: > > /* Prevent a recursive loop due to bad ORC data: > */ > > if (state->stack_info.type == prev_type > && > > on_stack(&state->stack_info, (void *)state->sp, > sizeof(long)) > && > state->sp <= prev_sp) > { > > orc_warn_current("stack going in the wrong direction? > at %pB\n", > (void > *)orig_ip); > > goto > err; > > } > > (and probably other places the ORC unwinder gets confused.) > > > It also manifests itself in other, more visible ways. For example, a > kernel module that calls dump_stack() in its init function or even > /proc/<pid>/stack: > > (fgkaslr on) > ------------ > > Call Trace: > ? dump_stack+0x57/0x73 > ? 0xffffffffc0850000 > ? mymodule_init+0xa/0x1000 [dumpstack] > ? do_one_initcall+0x46/0x1f0 > ? free_unref_page_commit+0x91/0x100 > ? _cond_resched+0x15/0x30 > ? kmem_cache_alloc_trace+0x14b/0x210 > ? do_init_module+0x5a/0x220 > ? load_module+0x1912/0x1b20 > ? __do_sys_finit_module+0xa8/0x110 > ? __do_sys_finit_module+0xa8/0x110 > ? do_syscall_64+0x47/0x80 > ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > % sudo cat /proc/$$/stack > [<0>] do_wait+0x1c3/0x230 > [<0>] kernel_wait4+0xa6/0x140 > > > fgkaslr=off > ----------- > > Call Trace: > dump_stack+0x57/0x73 > ? 0xffffffffc04f2000 > mymodule_init+0xa/0x1000 [readonly] > do_one_initcall+0x46/0x1f0 > ? free_unref_page_commit+0x91/0x100 > ? _cond_resched+0x15/0x30 > ? kmem_cache_alloc_trace+0x14b/0x210 > do_init_module+0x5a/0x220 > load_module+0x1912/0x1b20 > ? __do_sys_finit_module+0xa8/0x110 > __do_sys_finit_module+0xa8/0x110 > do_syscall_64+0x47/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > % sudo cat /proc/$$/stack > [<0>] do_wait+0x1c3/0x230 > [<0>] kernel_wait4+0xa6/0x140 > [<0>] __do_sys_wait4+0x83/0x90 > [<0>] do_syscall_64+0x47/0x80 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > I would think fixing and verifying these latter cases would be easier > than > chasing livepatch transitions (but would still probably fix klp case, > too). > Perhaps Josh or someone has other ORC unwinder tests that could be > used? > > -- Joe > OK, I have root caused these failures to the fact that the relocs for the orc_unwind_ip table that I use to adjust the values of the orc_unwind_ip table after randomization were incorrect because the table is sorted at build time now, thus making the relocs invalid. I can fix this either by turning off BUILDTIME_TABLE_SORT (and now the relocs are fine), or by ignoring any relocs in the orc_unwind_ip table and adjusting without relocs. I think it makes sense to just unset BUILDTIME_TABLE_SORT if CONFIG_FG_KASLR and continue to rely on relocs to work since it is a useless step anyway.
On Wed, 2020-07-22 at 16:33 -0500, Josh Poimboeuf wrote: > On Wed, Jul 22, 2020 at 12:56:10PM -0700, Kristen Carlson Accardi > wrote: > > On Wed, 2020-07-22 at 12:42 -0700, Kees Cook wrote: > > > On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote: > > > > On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote: > > > > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes > > > > > wrote: > > > > > > Let me CC live-patching ML, because from a quick glance > > > > > > this is > > > > > > something > > > > > > which could impact live patching code. At least it > > > > > > invalidates > > > > > > assumptions > > > > > > which "sympos" is based on. > > > > > > > > > > In a quick skim, it looks like the symbol resolution is using > > > > > kallsyms_on_each_symbol(), so I think this is safe? What's a > > > > > good > > > > > selftest for live-patching? > > > > > > > > The problem is duplicate symbols. If there are two static > > > > functions > > > > named 'foo' then livepatch needs a way to distinguish them. > > > > > > > > Our current approach to that problem is "sympos". We rely on > > > > the > > > > fact > > > > that the second foo() always comes after the first one in the > > > > symbol > > > > list and kallsyms. So they're referred to as foo,1 and foo,2. > > > > > > Ah. Fun. In that case, perhaps the LTO series has some solutions. > > > I > > > think builds with LTO end up renaming duplicate symbols like > > > that, so > > > it'll be back to being unique. > > > > > > > Well, glad to hear there might be some precendence for how to solve > > this, as I wasn't able to think of something reasonable off the top > > of > > my head. Are you speaking of the Clang LTO series? > > https://lore.kernel.org/lkml/20200624203200.78870-1-samitolvanen@google.com/ > > I'm not sure how LTO does it, but a few more (half-brained) ideas > that > could work: > > 1) Add a field in kallsyms to keep track of a symbol's original > offset > before randomization/re-sorting. Livepatch could use that field > to > determine the original sympos. > > 2) In fgkaslr code, go through all the sections and mark the ones > which > have duplicates (i.e. same name). Then when shuffling the > sections, > skip a shuffle if it involves a duplicate section. That way all > the > duplicates would retain their original sympos. > > 3) Livepatch could uniquely identify symbols by some feature other > than > sympos. For example: > > Symbol/function size - obviously this would only work if > duplicately > named symbols have different sizes. > > Checksum - as part of a separate feature we're also looking at > giving > each function its own checksum, calculated based on its > instruction > opcodes. Though calculating checksums at runtime could be > complicated by IP-relative addressing. > > I'm thinking #1 or #2 wouldn't be too bad. #3 might be harder. > Hi there! I was trying to find a super easy way to address this, so I thought the best thing would be if there were a compiler or linker switch to just eliminate any duplicate symbols at compile time for vmlinux. I filed this question on the binutils bugzilla looking to see if there were existing flags that might do this, but H.J. Lu went ahead and created a new one "-z unique", that seems to do what we would need it to do. https://sourceware.org/bugzilla/show_bug.cgi?id=26391 When I use this option, it renames any duplicate symbols with an extension - for example duplicatefunc.1 or duplicatefunc.2. You could either match on the full unique name of the specific binary you are trying to patch, or you match the base name and use the extension to determine original position. Do you think this solution would work? If so, I can modify livepatch to refuse to patch on duplicated symbols if CONFIG_FG_KASLR and when this option is merged into the tool chain I can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching should work in all cases.
On 8/21/20 7:02 PM, Kristen Carlson Accardi wrote: > On Wed, 2020-07-22 at 16:33 -0500, Josh Poimboeuf wrote: >> On Wed, Jul 22, 2020 at 12:56:10PM -0700, Kristen Carlson Accardi >> wrote: >>> On Wed, 2020-07-22 at 12:42 -0700, Kees Cook wrote: >>>> On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote: >>>>> On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote: >>>>>> On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes >>>>>> wrote: >>>>>>> Let me CC live-patching ML, because from a quick glance >>>>>>> this is >>>>>>> something >>>>>>> which could impact live patching code. At least it >>>>>>> invalidates >>>>>>> assumptions >>>>>>> which "sympos" is based on. >>>>>> >>>>>> In a quick skim, it looks like the symbol resolution is using >>>>>> kallsyms_on_each_symbol(), so I think this is safe? What's a >>>>>> good >>>>>> selftest for live-patching? >>>>> >>>>> The problem is duplicate symbols. If there are two static >>>>> functions >>>>> named 'foo' then livepatch needs a way to distinguish them. >>>>> >>>>> Our current approach to that problem is "sympos". We rely on >>>>> the >>>>> fact >>>>> that the second foo() always comes after the first one in the >>>>> symbol >>>>> list and kallsyms. So they're referred to as foo,1 and foo,2. >>>> >>>> Ah. Fun. In that case, perhaps the LTO series has some solutions. >>>> I >>>> think builds with LTO end up renaming duplicate symbols like >>>> that, so >>>> it'll be back to being unique. >>>> >>> >>> Well, glad to hear there might be some precendence for how to solve >>> this, as I wasn't able to think of something reasonable off the top >>> of >>> my head. Are you speaking of the Clang LTO series? >>> https://lore.kernel.org/lkml/20200624203200.78870-1-samitolvanen@google.com/ >> >> I'm not sure how LTO does it, but a few more (half-brained) ideas >> that >> could work: >> >> 1) Add a field in kallsyms to keep track of a symbol's original >> offset >> before randomization/re-sorting. Livepatch could use that field >> to >> determine the original sympos. >> >> 2) In fgkaslr code, go through all the sections and mark the ones >> which >> have duplicates (i.e. same name). Then when shuffling the >> sections, >> skip a shuffle if it involves a duplicate section. That way all >> the >> duplicates would retain their original sympos. >> >> 3) Livepatch could uniquely identify symbols by some feature other >> than >> sympos. For example: >> >> Symbol/function size - obviously this would only work if >> duplicately >> named symbols have different sizes. >> >> Checksum - as part of a separate feature we're also looking at >> giving >> each function its own checksum, calculated based on its >> instruction >> opcodes. Though calculating checksums at runtime could be >> complicated by IP-relative addressing. >> >> I'm thinking #1 or #2 wouldn't be too bad. #3 might be harder. >> > > Hi there! I was trying to find a super easy way to address this, so I > thought the best thing would be if there were a compiler or linker > switch to just eliminate any duplicate symbols at compile time for > vmlinux. I filed this question on the binutils bugzilla looking to see > if there were existing flags that might do this, but H.J. Lu went ahead > and created a new one "-z unique", that seems to do what we would need > it to do. > > https://sourceware.org/bugzilla/show_bug.cgi?id=26391 > > When I use this option, it renames any duplicate symbols with an > extension - for example duplicatefunc.1 or duplicatefunc.2. I tried out H.J. Lu's branch and built some of the livepatch selftests with -z unique-symbol and indeed observe the following pattern: foo, foo.1, foo.2, etc. for homonym symbol names. > You could > either match on the full unique name of the specific binary you are > trying to patch, or you match the base name and use the extension to > determine original position. Do you think this solution would work? I think it could work for klp-relocations. As a quick test, I was able to hack the WIP klp-convert branch [1] to generate klp-relocations with the following hack: const char *foo(void) __asm__("foo.1"); when building foo's target with -z unique-symbol. (The target contained two static foo() functions.) The asm rename trick exercised the klp-convert implicit conversion feature, as the symbol was now uniquely named and included a non-valid C symbol character. User-defined klp-convert annotation support will require some refactoring, but shouldn't be too difficult to support as well. > If > so, I can modify livepatch to refuse to patch on duplicated symbols if > CONFIG_FG_KASLR and when this option is merged into the tool chain I > can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching > should work in all cases. > I don't have a grasp on how complicated the alternatives might be, so I'll let others comment on best paths forward. I just wanted to note that -z unique-symbol looks like it could reasonable work well for this niche case. [1] https://github.com/joe-lawrence/linux/tree/klp-convert-v5-expanded-v5.8 (not modified for -z unique-symbol, but noted for reference) -- Joe
Leaving Josh's proposals here for reference... > > I'm not sure how LTO does it, but a few more (half-brained) ideas > > that > > could work: > > > > 1) Add a field in kallsyms to keep track of a symbol's original > > offset > > before randomization/re-sorting. Livepatch could use that field > > to > > determine the original sympos. > > > > 2) In fgkaslr code, go through all the sections and mark the ones > > which > > have duplicates (i.e. same name). Then when shuffling the > > sections, > > skip a shuffle if it involves a duplicate section. That way all > > the > > duplicates would retain their original sympos. > > > > 3) Livepatch could uniquely identify symbols by some feature other > > than > > sympos. For example: > > > > Symbol/function size - obviously this would only work if > > duplicately > > named symbols have different sizes. > > > > Checksum - as part of a separate feature we're also looking at > > giving > > each function its own checksum, calculated based on its > > instruction > > opcodes. Though calculating checksums at runtime could be > > complicated by IP-relative addressing. > > > > I'm thinking #1 or #2 wouldn't be too bad. #3 might be harder. > > > > Hi there! I was trying to find a super easy way to address this, so I > thought the best thing would be if there were a compiler or linker > switch to just eliminate any duplicate symbols at compile time for > vmlinux. I filed this question on the binutils bugzilla looking to see > if there were existing flags that might do this, but H.J. Lu went ahead > and created a new one "-z unique", that seems to do what we would need > it to do. > > https://sourceware.org/bugzilla/show_bug.cgi?id=26391 > > When I use this option, it renames any duplicate symbols with an > extension - for example duplicatefunc.1 or duplicatefunc.2. You could > either match on the full unique name of the specific binary you are > trying to patch, or you match the base name and use the extension to > determine original position. Do you think this solution would work? Yes, I think so (thanks, Joe, for testing!). It looks cleaner to me than the options above, but it may just be a matter of taste. Anyway, I'd go with full name matching, because -z unique-symbol would allow us to remove sympos altogether, which is appealing. > If > so, I can modify livepatch to refuse to patch on duplicated symbols if > CONFIG_FG_KASLR and when this option is merged into the tool chain I > can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching > should work in all cases. Ok. Josh, Petr, would this work for you too? Thanks Miroslav
On Fri, Aug 28, 2020 at 12:21:13PM +0200, Miroslav Benes wrote: > > Hi there! I was trying to find a super easy way to address this, so I > > thought the best thing would be if there were a compiler or linker > > switch to just eliminate any duplicate symbols at compile time for > > vmlinux. I filed this question on the binutils bugzilla looking to see > > if there were existing flags that might do this, but H.J. Lu went ahead > > and created a new one "-z unique", that seems to do what we would need > > it to do. > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=26391 > > > > When I use this option, it renames any duplicate symbols with an > > extension - for example duplicatefunc.1 or duplicatefunc.2. You could > > either match on the full unique name of the specific binary you are > > trying to patch, or you match the base name and use the extension to > > determine original position. Do you think this solution would work? > > Yes, I think so (thanks, Joe, for testing!). > > It looks cleaner to me than the options above, but it may just be a matter > of taste. Anyway, I'd go with full name matching, because -z unique-symbol > would allow us to remove sympos altogether, which is appealing. > > > If > > so, I can modify livepatch to refuse to patch on duplicated symbols if > > CONFIG_FG_KASLR and when this option is merged into the tool chain I > > can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching > > should work in all cases. > > Ok. > > Josh, Petr, would this work for you too? Sounds good to me. Kristen, thanks for finding a solution!
On 2020-08-28, Josh Poimboeuf wrote: >On Fri, Aug 28, 2020 at 12:21:13PM +0200, Miroslav Benes wrote: >> > Hi there! I was trying to find a super easy way to address this, so I >> > thought the best thing would be if there were a compiler or linker >> > switch to just eliminate any duplicate symbols at compile time for >> > vmlinux. I filed this question on the binutils bugzilla looking to see >> > if there were existing flags that might do this, but H.J. Lu went ahead >> > and created a new one "-z unique", that seems to do what we would need >> > it to do. >> > >> > https://sourceware.org/bugzilla/show_bug.cgi?id=26391 >> > >> > When I use this option, it renames any duplicate symbols with an >> > extension - for example duplicatefunc.1 or duplicatefunc.2. You could >> > either match on the full unique name of the specific binary you are >> > trying to patch, or you match the base name and use the extension to >> > determine original position. Do you think this solution would work? >> >> Yes, I think so (thanks, Joe, for testing!). >> >> It looks cleaner to me than the options above, but it may just be a matter >> of taste. Anyway, I'd go with full name matching, because -z unique-symbol >> would allow us to remove sympos altogether, which is appealing. >> >> > If >> > so, I can modify livepatch to refuse to patch on duplicated symbols if >> > CONFIG_FG_KASLR and when this option is merged into the tool chain I >> > can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching >> > should work in all cases. >> >> Ok. >> >> Josh, Petr, would this work for you too? > >Sounds good to me. Kristen, thanks for finding a solution! (I am not subscribed. I came here via https://sourceware.org/bugzilla/show_bug.cgi?id=26391 (ld -z unique-symbol)) > This works great after randomization because it always receives the > current address at runtime rather than relying on any kind of > buildtime address. The issue with with the live-patching code's > algorithm for resolving duplicate symbol names. If they request a > symbol by name from the kernel and there are 3 symbols with the same > name, they use the symbol's position in the built binary image to > select the correct symbol. If a.o, b.o and c.o define local symbol 'foo'. By position, do you mean that * the live-patching code uses something like (findall("foo")[0], findall("foo")[1], findall("foo")[2]) ? * shuffling a.o/b.o/c.o will make the returned triple different Local symbols are not required to be unique. Instead of patching the toolchain, have you thought about making the live-patching code smarter? (Depend on the duplicates, such a linker option can increase the link time/binary size considerably AND I don't know in what other cases such an option will be useful) For the following example, https://sourceware.org/bugzilla/show_bug.cgi?id=26822 # RUN: split-file %s %t # RUN: gcc -c %t/a.s -o %t/a.o # RUN: gcc -c %t/b.s -o %t/b.o # RUN: gcc -c %t/c.s -o %t/c.o # RUN: ld-new %t/a.o %t/b.o %t/c.o -z unique-symbol -o %t.exe #--- a.s a: a.1: a.2: nop #--- b.s a: nop #--- c.s a: nop readelf -Ws output: Symbol table '.symtab' contains 13 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS a.o 2: 0000000000401000 0 NOTYPE LOCAL DEFAULT 1 a 3: 0000000000401000 0 NOTYPE LOCAL DEFAULT 1 a.1 4: 0000000000401000 0 NOTYPE LOCAL DEFAULT 1 a.2 5: 0000000000000000 0 FILE LOCAL DEFAULT ABS b.o 6: 0000000000401001 0 NOTYPE LOCAL DEFAULT 1 a.1 7: 0000000000000000 0 FILE LOCAL DEFAULT ABS c.o 8: 0000000000401002 0 NOTYPE LOCAL DEFAULT 1 a.2 9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _start 10: 0000000000402000 0 NOTYPE GLOBAL DEFAULT 1 __bss_start 11: 0000000000402000 0 NOTYPE GLOBAL DEFAULT 1 _edata 12: 0000000000402000 0 NOTYPE GLOBAL DEFAULT 1 _end Note that you have STT_FILE SHN_ABS symbols. If the compiler does not produce them, they will be synthesized by GNU ld. https://sourceware.org/bugzilla/show_bug.cgi?id=26822 ld.bfd copies non-STT_SECTION local symbols from input object files. If an object file does not have STT_FILE symbols (no .file directive) but has non-STT_SECTION local symbols, ld.bfd synthesizes a STT_FILE symbol The filenames are usually base names, so "a.o" and "a.o" in two directories will be indistinguishable. The live-patching code can possibly work around this by not changing the relative order of the two "a.o".
On Sat, Jan 23, 2021 at 02:59:28PM -0800, Fangrui Song wrote: > On 2020-08-28, Josh Poimboeuf wrote: > > On Fri, Aug 28, 2020 at 12:21:13PM +0200, Miroslav Benes wrote: > > > > Hi there! I was trying to find a super easy way to address this, so I > > > > thought the best thing would be if there were a compiler or linker > > > > switch to just eliminate any duplicate symbols at compile time for > > > > vmlinux. I filed this question on the binutils bugzilla looking to see > > > > if there were existing flags that might do this, but H.J. Lu went ahead > > > > and created a new one "-z unique", that seems to do what we would need > > > > it to do. > > > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=26391 > > > > > > > > When I use this option, it renames any duplicate symbols with an > > > > extension - for example duplicatefunc.1 or duplicatefunc.2. You could > > > > either match on the full unique name of the specific binary you are > > > > trying to patch, or you match the base name and use the extension to > > > > determine original position. Do you think this solution would work? > > > > > > Yes, I think so (thanks, Joe, for testing!). > > > > > > It looks cleaner to me than the options above, but it may just be a matter > > > of taste. Anyway, I'd go with full name matching, because -z unique-symbol > > > would allow us to remove sympos altogether, which is appealing. > > > > > > > If > > > > so, I can modify livepatch to refuse to patch on duplicated symbols if > > > > CONFIG_FG_KASLR and when this option is merged into the tool chain I > > > > can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching > > > > should work in all cases. > > > > > > Ok. > > > > > > Josh, Petr, would this work for you too? > > > > Sounds good to me. Kristen, thanks for finding a solution! > > (I am not subscribed. I came here via https://sourceware.org/bugzilla/show_bug.cgi?id=26391 (ld -z unique-symbol)) > > > This works great after randomization because it always receives the > > current address at runtime rather than relying on any kind of > > buildtime address. The issue with with the live-patching code's > > algorithm for resolving duplicate symbol names. If they request a > > symbol by name from the kernel and there are 3 symbols with the same > > name, they use the symbol's position in the built binary image to > > select the correct symbol. > > If a.o, b.o and c.o define local symbol 'foo'. > By position, do you mean that > > * the live-patching code uses something like (findall("foo")[0], findall("foo")[1], findall("foo")[2]) ? Yes, it depends on their order in the symbol table of the linked binary (vmlinux). > * shuffling a.o/b.o/c.o will make the returned triple different Yes, though it's actually functions that get shuffled. > Local symbols are not required to be unique. Instead of patching the toolchain, > have you thought about making the live-patching code smarter? It's a possibility (more on that below). > (Depend on the duplicates, such a linker option can increase the link time/binary size considerably Have you tried it on vmlinux? Just wondering what the time/size impact would be in real-world numbers. Duplicate symbols make up a very small percentage of all symbols in the kernel, so I would think the binary size change (to the strtab?) would be insignificant? > AND I don't know in what other cases such an option will be useful) I believe some other kernel components (tracing, kprobes, bpf) have the same problem as livepatch with respect to disambiguating duplicate symbols, for the purposes of tracing/debugging. So I'm thinking it would be a nice overall improvement to the kernel. > For the following example, > > https://sourceware.org/bugzilla/show_bug.cgi?id=26822 > > # RUN: split-file %s %t > # RUN: gcc -c %t/a.s -o %t/a.o > # RUN: gcc -c %t/b.s -o %t/b.o > # RUN: gcc -c %t/c.s -o %t/c.o > # RUN: ld-new %t/a.o %t/b.o %t/c.o -z unique-symbol -o %t.exe > #--- a.s > a: a.1: a.2: nop > #--- b.s > a: nop > #--- c.s > a: nop > > readelf -Ws output: > > Symbol table '.symtab' contains 13 entries: > Num: Value Size Type Bind Vis Ndx Name > 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: > 0000000000000000 0 FILE LOCAL DEFAULT ABS a.o > 2: 0000000000401000 0 NOTYPE LOCAL DEFAULT 1 a > 3: 0000000000401000 0 NOTYPE LOCAL DEFAULT 1 a.1 > 4: 0000000000401000 0 NOTYPE LOCAL DEFAULT 1 a.2 > 5: 0000000000000000 0 FILE LOCAL DEFAULT ABS b.o > 6: 0000000000401001 0 NOTYPE LOCAL DEFAULT 1 a.1 > 7: 0000000000000000 0 FILE LOCAL DEFAULT ABS c.o > 8: 0000000000401002 0 NOTYPE LOCAL DEFAULT 1 a.2 > 9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _start > 10: 0000000000402000 0 NOTYPE GLOBAL DEFAULT 1 __bss_start > 11: 0000000000402000 0 NOTYPE GLOBAL DEFAULT 1 _edata > 12: 0000000000402000 0 NOTYPE GLOBAL DEFAULT 1 _end > > Note that you have STT_FILE SHN_ABS symbols. > If the compiler does not produce them, they will be synthesized by GNU ld. > > https://sourceware.org/bugzilla/show_bug.cgi?id=26822 > ld.bfd copies non-STT_SECTION local symbols from input object files. If an > object file does not have STT_FILE symbols (no .file directive) but has > non-STT_SECTION local symbols, ld.bfd synthesizes a STT_FILE symbol Right, I see what you're getting at. As far as I can tell, there are potentially two ways for fgkaslr to handle this: a) shuffle files, not functions. i.e. keep the functions' order intact within the STT_FILE group, shuffling the file groups themselves. (NOTE: this may have an additional benefit of improving i-cache performance, compared to the current fgkaslr implementation.) or b) shuffle functions, keeping track of what file they belonged to. Maybe Kristen could comment on the feasibility of either of these options. I believe the STT_FILE symbols are not currently available to the kernel at runtime. They would need to be made available to both fgkaslr and livepatch code. Overall "ld -z unique-symbol" would be much easier from a kernel standpoint, and would benefit multiple components as I mentioned above. > The filenames are usually base names, so "a.o" and "a.o" in two directories will > be indistinguishable. The live-patching code can possibly work around this by > not changing the relative order of the two "a.o". Right, there are some file:func duplicates so this case would indeed need to be handled somehow. $ readelf -s --wide vmlinux |awk '$4 == "FILE" {file=$8; next} $4 == "FUNC" {printf "%s:%s\n", file, $8}' |sort |uniq -d bus.c:new_id_store core.c:cmask_show core.c:edge_show core.c:event_show core.c:inv_show core.c:paravirt_read_msr core.c:paravirt_read_msr_safe core.c:type_show core.c:umask_show hid-core.c:hid_exit hid-core.c:hid_init inode.c:init_once inode.c:remove_one msr.c:msr_init proc.c:c_next proc.c:c_start proc.c:c_stop raw.c:dst_output raw.c:raw_ioctl route.c:dst_discard super.c:init_once udp.c:udp_lib_close udp.c:udp_lib_hash udp.c:udplite_getfrag udplite.c:udp_lib_close udplite.c:udp_lib_hash udplite.c:udplite_sk_init