Message ID | 20220420004241.2093-2-joao@overdrivepizza.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | Kernel FineIBT Support | expand |
On Tue, Apr 19, 2022 at 05:42:31PM -0700, joao@overdrivepizza.com wrote: > +void __noendbr __fineibt_handler(void){ > + unsigned i; > + unsigned long flags; > + bool skip; > + void * ret; > + void * caller; > + > + DO_ALL_PUSHS; So this function isn't C ABI compliant, right? e.g. the compiler just calls the handler without regard for preserving registers? If this function is going to be implemented in C, it should probably have an asm thunk wrapper which can properly save/restore the registers before calling into the C version. Even better, if the compiler did an invalid op (UD2?), which I think you mentioned elsewhere, instead of calling the handler directly, and there were a way for the trap code to properly detect it as a FineIBT violation, we could get rid of the pushes/pops, plus the uaccess objtool warning from patch 7, plus I'm guessing a bunch of noinstr validation warnings. > + > + spin_lock_irqsave(&fineibt_lock, flags); > + skip = false; > + > + asm("\t movq 0x90(%%rsp),%0" : "=r"(ret)); > + asm("\t movq 0x98(%%rsp),%0" : "=r"(caller)); This is making some questionable assumptions about the stack layout. I assume this function is still in the prototype stage ;-) > + if(!skip) { > + printk("FineIBT violation: %px:%px:%u\n", ret, caller, > + vlts_next); > + } > + DO_ALL_POPS; > +} Right now this handler just does a printk if it hasn't already for this caller/callee combo, and then resumes control. Which is fine for debugging, but it really needs to behave similarly to an IBT violation, by panicking unless "ibt=warn" on the cmdline. Not sure what would happen for "ibt=off"? Maybe apply_ibt_endbr() could NOP out all the FineIBT stuff.
On 2022-04-28 18:37, Josh Poimboeuf wrote: > On Tue, Apr 19, 2022 at 05:42:31PM -0700, joao@overdrivepizza.com > wrote: >> +void __noendbr __fineibt_handler(void){ >> + unsigned i; >> + unsigned long flags; >> + bool skip; >> + void * ret; >> + void * caller; >> + >> + DO_ALL_PUSHS; > > So this function isn't C ABI compliant, right? e.g. the compiler just > calls the handler without regard for preserving registers? > > If this function is going to be implemented in C, it should probably > have an asm thunk wrapper which can properly save/restore the registers > before calling into the C version. > > Even better, if the compiler did an invalid op (UD2?), which I think > you > mentioned elsewhere, instead of calling the handler directly, and there > were a way for the trap code to properly detect it as a FineIBT > violation, we could get rid of the pushes/pops, plus the uaccess > objtool > warning from patch 7, plus I'm guessing a bunch of noinstr validation > warnings. Cool, I'll try to come up with something! > >> + >> + spin_lock_irqsave(&fineibt_lock, flags); >> + skip = false; >> + >> + asm("\t movq 0x90(%%rsp),%0" : "=r"(ret)); >> + asm("\t movq 0x98(%%rsp),%0" : "=r"(caller)); > > This is making some questionable assumptions about the stack layout. > > I assume this function is still in the prototype stage ;-) Yeah, this is just a messy instrumentation to get reports about mismatching prototypes (as the ones reported further down the series). The issue with having the call is that it bloats the binary, so the ud2 is 3-bytes-per-function better. Yet, we may consider a FINEIBT_DEBUG config, which can then enable a handler. This would be useful together with a fuzzer or a stress tool to cover possible control-flow paths within the kernel and map mismatching prototypes more properly I guess. > >> + if(!skip) { >> + printk("FineIBT violation: %px:%px:%u\n", ret, caller, >> + vlts_next); >> + } >> + DO_ALL_POPS; >> +} > > Right now this handler just does a printk if it hasn't already for this > caller/callee combo, and then resumes control. Which is fine for > debugging, but it really needs to behave similarly to an IBT violation, > by panicking unless "ibt=warn" on the cmdline. > > Not sure what would happen for "ibt=off"? Maybe apply_ibt_endbr() > could > NOP out all the FineIBT stuff. Either that, or... I'm thinking about a way to have FineIBT interchangeable with KCFI. Currently KCFI adds a 4 byte hash + 2 byte nops before function entry, to allow for proper prototype checking. After that, there should be an ENDBR of 4 bytes. This gives us 10 bytes in total. Then, my yet to be properly thought idea would be patch these 10 bytes with: endbr call fineibt_handler_<$HASH> nop and then, on the caller side, patch the "cmp <$HASH>, -0x6(%r11); je; ud2; call" sequence with a "sub 0x6, r11; mov $HASH, %r10; call %r11, add 0x6 %r11". This would then allow the kernel to verify if the CPU is IBT capable on boot time and only then setting the proper scheme. The downsides of having something like this would be that this sub r11/add r11 sequence is kinda meh. We can avoid that by having two padding nops after the original ENDBR, which will be skipped when the function is reached directly by the linker optimization I'm working on, and that we can convert into a JMP -offset that makes control flow reach the padding area before the prologue and from where we can call the fineibt_handler function. The resulting instrumentation would be something like: 1: call fineibt_handler_<$HASH> jmp 2f <foo> endbr jmp 1b 2: Also, it would prevent a paranoid user to have both schemes simultaneously (there are reasons why people could want that). Any thoughts?
On Mon, May 02, 2022 at 10:17:42AM -0700, Joao Moreira wrote: > > > + asm("\t movq 0x90(%%rsp),%0" : "=r"(ret)); > > > + asm("\t movq 0x98(%%rsp),%0" : "=r"(caller)); > > > > This is making some questionable assumptions about the stack layout. > > > > I assume this function is still in the prototype stage ;-) > > Yeah, this is just a messy instrumentation to get reports about mismatching > prototypes (as the ones reported further down the series). > > The issue with having the call is that it bloats the binary, so the ud2 is > 3-bytes-per-function better. Yet, we may consider a FINEIBT_DEBUG config, > which can then enable a handler. This would be useful together with a fuzzer > or a stress tool to cover possible control-flow paths within the kernel and > map mismatching prototypes more properly I guess. It should be possible to have a non-fatal #UD2 handler. See for example how WARN() is implemented with __WARN_FLAGS in arch/x86/include/asm/bug.h . So hopefully we can just get rid of the need for the "call handler" thing altogether. > > Not sure what would happen for "ibt=off"? Maybe apply_ibt_endbr() could > > NOP out all the FineIBT stuff. > > Either that, or... > > I'm thinking about a way to have FineIBT interchangeable with KCFI. > Currently KCFI adds a 4 byte hash + 2 byte nops before function entry, to > allow for proper prototype checking. After that, there should be an ENDBR of > 4 bytes. This gives us 10 bytes in total. Then, my yet to be properly > thought idea would be patch these 10 bytes with: > > endbr > call fineibt_handler_<$HASH> > nop > > and then, on the caller side, patch the "cmp <$HASH>, -0x6(%r11); je; ud2; > call" sequence with a "sub 0x6, r11; mov $HASH, %r10; call %r11, add 0x6 > %r11". This would then allow the kernel to verify if the CPU is IBT capable > on boot time and only then setting the proper scheme. > > The downsides of having something like this would be that this sub r11/add > r11 sequence is kinda meh. We can avoid that by having two padding nops > after the original ENDBR, which will be skipped when the function is reached > directly by the linker optimization I'm working on, and that we can convert > into a JMP -offset that makes control flow reach the padding area before the > prologue and from where we can call the fineibt_handler function. The > resulting instrumentation would be something like: > > 1: > call fineibt_handler_<$HASH> > jmp 2f > <foo> > endbr > jmp 1b > 2: > > Also, it would prevent a paranoid user to have both schemes simultaneously > (there are reasons why people could want that). > > Any thoughts? I'm not really qualified to comment on this too directly since I haven't looked very much at the variations on FineIBT/CFI/KCFI, and what the protections and drawbacks are for each approach, and when it might even make sense to combine them for a "paranoid user". Since we have multiple similar and possibly competing technologies being discussed, one thing I do want to warn against is that we as kernel developers tend to err on the side of giving people too many choices and combinations which *never* get used. All those unused options can confuse the user and significantly add to complexity and maintenance overhead for us. Especially for invasive features like these. (Just to be clear, I'm not saying that's happening here, but it's something we need to be careful about.) Here, documentation is going to be crucial, for both reviewers and users. Something that describes when/why I should use X or Y or X+Y. If we truly want to add more options/combos for different use cases then we'll also need clear and concise documentation about which options/combos would be used under what circumstances.
> > It should be possible to have a non-fatal #UD2 handler. > > See for example how WARN() is implemented with __WARN_FLAGS in > arch/x86/include/asm/bug.h . > > So hopefully we can just get rid of the need for the "call handler" > thing altogether. > Nice, I'll look into it. Tks. >> > Not sure what would happen for "ibt=off"? Maybe apply_ibt_endbr() could >> > NOP out all the FineIBT stuff. >> >> Either that, or... >> >> I'm thinking about a way to have FineIBT interchangeable with KCFI. >> Currently KCFI adds a 4 byte hash + 2 byte nops before function entry, >> to >> allow for proper prototype checking. After that, there should be an >> ENDBR of >> 4 bytes. This gives us 10 bytes in total. Then, my yet to be properly >> thought idea would be patch these 10 bytes with: >> >> endbr >> call fineibt_handler_<$HASH> >> nop >> >> and then, on the caller side, patch the "cmp <$HASH>, -0x6(%r11); je; >> ud2; >> call" sequence with a "sub 0x6, r11; mov $HASH, %r10; call %r11, add >> 0x6 >> %r11". This would then allow the kernel to verify if the CPU is IBT >> capable >> on boot time and only then setting the proper scheme. >> >> The downsides of having something like this would be that this sub >> r11/add >> r11 sequence is kinda meh. We can avoid that by having two padding >> nops >> after the original ENDBR, which will be skipped when the function is >> reached >> directly by the linker optimization I'm working on, and that we can >> convert >> into a JMP -offset that makes control flow reach the padding area >> before the >> prologue and from where we can call the fineibt_handler function. The >> resulting instrumentation would be something like: >> >> 1: >> call fineibt_handler_<$HASH> >> jmp 2f >> <foo> >> endbr >> jmp 1b >> 2: >> >> Also, it would prevent a paranoid user to have both schemes >> simultaneously >> (there are reasons why people could want that). >> >> Any thoughts? > > I'm not really qualified to comment on this too directly since I > haven't > looked very much at the variations on FineIBT/CFI/KCFI, and what the > protections and drawbacks are for each approach, and when it might even > make sense to combine them for a "paranoid user". > > Since we have multiple similar and possibly competing technologies > being > discussed, one thing I do want to warn against is that we as kernel > developers tend to err on the side of giving people too many choices > and > combinations which *never* get used. > > All those unused options can confuse the user and significantly add to > complexity and maintenance overhead for us. Especially for invasive > features like these. > > (Just to be clear, I'm not saying that's happening here, but it's > something we need to be careful about.) > > Here, documentation is going to be crucial, for both reviewers and > users. Something that describes when/why I should use X or Y or X+Y. > > If we truly want to add more options/combos for different use cases > then > we'll also need clear and concise documentation about which > options/combos would be used under what circumstances. Yeah, I totally understand/support this concern and I feel the same way. While, in this case, I can't see super clear reasons for X+Y, there are aspects why someone could prefer X or Y -- so I think that using alternatives to flip the instrumentation is a valid consideration. In time, taking the chance to be fair on the credits, using alternatives to replace KCFI/FineIBT was also Peter's idea, not mine. It looked hard to do at first sight because of the caller/callee-side checks differences, but since Peter mentioned it, I started trying to solve the puzzle of having the best suitable instrumentation that would be changeable. I haven't discussed this with anyone yet, but at this point I think it might be doable, although not in the most performant shape. Anyway, I'll post something here once I have a more solid idea. And yes, I agree that documentation will be key and I totally see your point/understand how confusing I was in my previous e-mail. I'll keep that in mind for the next revision. Thanks for pointing it out :)
On Tue, May 03, 2022 at 03:02:44PM -0700, Josh Poimboeuf wrote: > I'm not really qualified to comment on this too directly since I haven't > looked very much at the variations on FineIBT/CFI/KCFI, and what the > protections and drawbacks are for each approach, and when it might even > make sense to combine them for a "paranoid user". > > Since we have multiple similar and possibly competing technologies being > discussed, one thing I do want to warn against is that we as kernel > developers tend to err on the side of giving people too many choices and > combinations which *never* get used. So I don't think there's going to be a user choice here. If there's hardware support, FineIBT makes more sense. That also means that kCFI no longer needs to worry about IBT. If we do something like: kCFI FineIBT __cfi_\sym: __cfi_\sym: endbr # 4 endbr # 4 sub $hash, %r10 # 7 sub $hash, %r10 # 7 je \sym # 2 je \sym # 2 ud2 # 2 ud2 # 2 \sym: \sym: caller: caller: cmpl $hash, -8(%r11) # 8 movl $hash, %r10d # 6 je 1f # 2 sub 15, %r11 # 4 ud2 # 2 call *%r11 # 3 1: call __x86_indirect_thunk_r11 # 5 .nop 4 # 4 (could even fix up r11 again) Then, all that's required is a slight tweak to apply_retpolines() to rewrite a little more text. Note that this also does away with having to fix up the linker, since all direct call will already point at \sym. It's just the IBT indirect calls that need to frob the pointer in order to hit the ENDBR. On top of that, we no longer have to special case the objtool instruction decoder, the prelude are proper instructions now.
On Wed, May 04, 2022 at 12:20:19PM +0200, Peter Zijlstra wrote: > On Tue, May 03, 2022 at 03:02:44PM -0700, Josh Poimboeuf wrote: > > > I'm not really qualified to comment on this too directly since I haven't > > looked very much at the variations on FineIBT/CFI/KCFI, and what the > > protections and drawbacks are for each approach, and when it might even > > make sense to combine them for a "paranoid user". > > > > Since we have multiple similar and possibly competing technologies being > > discussed, one thing I do want to warn against is that we as kernel > > developers tend to err on the side of giving people too many choices and > > combinations which *never* get used. > > So I don't think there's going to be a user choice here. If there's > hardware support, FineIBT makes more sense. That also means that kCFI no > longer needs to worry about IBT. > > If we do something like: > > > kCFI FineIBT > > __cfi_\sym: __cfi_\sym: > endbr # 4 endbr # 4 > sub $hash, %r10 # 7 sub $hash, %r10 # 7 > je \sym # 2 je \sym # 2 > ud2 # 2 ud2 # 2 > \sym: \sym: > > > caller: caller: > cmpl $hash, -8(%r11) # 8 movl $hash, %r10d # 6 > je 1f # 2 sub 15, %r11 # 4 > ud2 # 2 call *%r11 # 3 > 1: call __x86_indirect_thunk_r11 # 5 .nop 4 # 4 (could even fix up r11 again) > > > Then, all that's required is a slight tweak to apply_retpolines() to > rewrite a little more text. > > Note that this also does away with having to fix up the linker, since > all direct call will already point at \sym. It's just the IBT indirect > calls that need to frob the pointer in order to hit the ENDBR. > > On top of that, we no longer have to special case the objtool > instruction decoder, the prelude are proper instructions now. For kCFI this brings back the gadget problem that I mentioned here: https://lore.kernel.org/all/Yh7fLRYl8KgMcOe5@google.com/ because the hash at the call site is 8 bytes before the call instruction. Peter
On Wed, May 04, 2022 at 10:04:02AM -0700, Peter Collingbourne wrote: > On Wed, May 04, 2022 at 12:20:19PM +0200, Peter Zijlstra wrote: > > On Tue, May 03, 2022 at 03:02:44PM -0700, Josh Poimboeuf wrote: > > > > > I'm not really qualified to comment on this too directly since I haven't > > > looked very much at the variations on FineIBT/CFI/KCFI, and what the > > > protections and drawbacks are for each approach, and when it might even > > > make sense to combine them for a "paranoid user". > > > > > > Since we have multiple similar and possibly competing technologies being > > > discussed, one thing I do want to warn against is that we as kernel > > > developers tend to err on the side of giving people too many choices and > > > combinations which *never* get used. > > > > So I don't think there's going to be a user choice here. If there's > > hardware support, FineIBT makes more sense. That also means that kCFI no > > longer needs to worry about IBT. > > > > If we do something like: > > > > > > kCFI FineIBT > > > > __cfi_\sym: __cfi_\sym: > > endbr # 4 endbr # 4 > > sub $hash, %r10 # 7 sub $hash, %r10 # 7 > > je \sym # 2 je \sym # 2 > > ud2 # 2 ud2 # 2 > > \sym: \sym: > > > > > > caller: caller: > > cmpl $hash, -8(%r11) # 8 movl $hash, %r10d # 6 > > je 1f # 2 sub 15, %r11 # 4 > > ud2 # 2 call *%r11 # 3 > > 1: call __x86_indirect_thunk_r11 # 5 .nop 4 # 4 (could even fix up r11 again) > > > > > > Then, all that's required is a slight tweak to apply_retpolines() to > > rewrite a little more text. > > > > Note that this also does away with having to fix up the linker, since > > all direct call will already point at \sym. It's just the IBT indirect > > calls that need to frob the pointer in order to hit the ENDBR. > > > > On top of that, we no longer have to special case the objtool > > instruction decoder, the prelude are proper instructions now. > > For kCFI this brings back the gadget problem that I mentioned here: > https://lore.kernel.org/all/Yh7fLRYl8KgMcOe5@google.com/ > > because the hash at the call site is 8 bytes before the call > instruction. Damn, I forgot about that. Too subtle :-/ So Joao had another crazy idea, lemme diagram that to see if it works. (sorry I inverted the order by accident) FineIBT kCFI __fineibt_\hash: xor \hash, %r10 # 7 jz 1f # 2 ud2 # 2 1: ret # 1 int3 # 1 __cfi_\sym: __cfi_\sym: int3; int3 # 2 endbr # 4 mov \hash, %eax # 5 call __fineibt_\hash # 5 int3; int3 # 2 \sym: \sym: ... ... caller: caller: movl \hash, %r10d # 6 cmpl \hash, -6(%r11) # 8 sub $9, %r11 # 4 je 1f # 2 call *%r11 # 3 ud2 # 2 .nop 4 # 4 (or fixup r11) call __x86_indirect_thunk_r11 # 5 This way we also need to patch the __cfi_\sym contents, but we get a little extra room to place the constant for kCFI in a suitable location. It seems to preserve the properties of the last one in that direct calls will already be correct and we don't need linker fixups, and objtool can simply parse the preamble as regular instructions without needing further help.
On Wed, May 4, 2022 at 11:17 AM Peter Zijlstra <peterz@infradead.org> wrote: > __cfi_\sym: __cfi_\sym: > int3; int3 # 2 > endbr # 4 mov \hash, %eax # 5 > call __fineibt_\hash # 5 int3; int3 # 2 > \sym: \sym: OK, that looks reasonable to me. > It seems to preserve the properties of the last one in that direct calls > will already be correct and we don't need linker fixups, and objtool can > simply parse the preamble as regular instructions without needing > further help. Wouldn't objtool still print out unreachable instruction warnings here? Sami
On Wed, May 04, 2022 at 05:28:57PM -0700, Sami Tolvanen wrote: > On Wed, May 4, 2022 at 11:17 AM Peter Zijlstra <peterz@infradead.org> wrote: > > __cfi_\sym: __cfi_\sym: > > int3; int3 # 2 > > endbr # 4 mov \hash, %eax # 5 > > call __fineibt_\hash # 5 int3; int3 # 2 > > \sym: \sym: > > OK, that looks reasonable to me. > > > It seems to preserve the properties of the last one in that direct calls > > will already be correct and we don't need linker fixups, and objtool can > > simply parse the preamble as regular instructions without needing > > further help. > > Wouldn't objtool still print out unreachable instruction warnings here? Depends a bit on what kind of symbol they end up being, if they're STT_FUNC we'll probably get the complaint that it falls through into the next symbol, while if they're STT_NOTYPE then yes, we'll get the unreachable thing. So either way we need to special case the __cfi_\sym things anyway. But that should be relatively straight forward. I think I would lean towards making then STT_FUNC (they really are for FineIBT anyway) and then supressing the fallthrough warning for all functions that start with "__cfi_". This way we get an ORC entry for them and the unwinder will be happy.
On Wed, May 04, 2022 at 08:16:57PM +0200, Peter Zijlstra wrote: > FineIBT kCFI > > __fineibt_\hash: > xor \hash, %r10 # 7 > jz 1f # 2 > ud2 # 2 > 1: ret # 1 > int3 # 1 > > > __cfi_\sym: __cfi_\sym: > int3; int3 # 2 > endbr # 4 mov \hash, %eax # 5 > call __fineibt_\hash # 5 int3; int3 # 2 > \sym: \sym: > ... ... > > > caller: caller: > movl \hash, %r10d # 6 cmpl \hash, -6(%r11) # 8 > sub $9, %r11 # 4 je 1f # 2 > call *%r11 # 3 ud2 # 2 > .nop 4 # 4 (or fixup r11) call __x86_indirect_thunk_r11 # 5 This looks good! And just to double-check my understanding here... \sym is expected to start with endbr with IBT + kCFI? Random extra thoughts... feel free to ignore. :) Given that both CFI schemes depend on an attacker not being able to construct an executable memory region that either starts with endbr (for FineIBT) or starts with hash & 2 bytes (for kCFI), we should likely take another look at where the kernel uses PAGE_KERNEL_EXEC. It seems non-specialized use is entirely done via module_alloc(). Obviously modules need to stay as-is. So we're left with other module_alloc() callers: BPF JIT, ftrace, and kprobes. Perhaps enabling CFI should tie bpf_jit_harden (which performs constant blinding) to the value of bpf_jit_enable? (i.e. either use BPF VM which reads from non-exec memory, or use BPF JIT with constant blinding.) I *think* all the kprobes and ftrace stuff ends up using constructed direct calls, though, yes? So if we did bounds checking, we could "exclude" them as well as the BPF JIT. Though I'm not sure how controllable the content written to the kprobes and ftrace regions are, though? For exclusion, we could separate actual modules from the other module_alloc() users by maybe allocating in opposite directions from the randomized offset and check indirect calls against the kernel text bounds and the new modules-only bounds. Sounds expensive, though. Maybe PKS, but I can't imagine 2 MSR writes per indirect call would be fast. Hmm...
On Sun, May 08, 2022 at 01:29:13AM -0700, Kees Cook wrote: > On Wed, May 04, 2022 at 08:16:57PM +0200, Peter Zijlstra wrote: > > FineIBT kCFI > > > > __fineibt_\hash: > > xor \hash, %r10 # 7 > > jz 1f # 2 > > ud2 # 2 > > 1: ret # 1 > > int3 # 1 > > > > > > __cfi_\sym: __cfi_\sym: > > int3; int3 # 2 > > endbr # 4 mov \hash, %eax # 5 > > call __fineibt_\hash # 5 int3; int3 # 2 > > \sym: \sym: > > ... ... > > > > > > caller: caller: > > movl \hash, %r10d # 6 cmpl \hash, -6(%r11) # 8 > > sub $9, %r11 # 4 je 1f # 2 > > call *%r11 # 3 ud2 # 2 > > .nop 4 # 4 (or fixup r11) call __x86_indirect_thunk_r11 # 5 > > This looks good! > > And just to double-check my understanding here... \sym is expected to > start with endbr with IBT + kCFI? Ah, the thinking was that 'if IBT then FineIBT', so the combination of kCFI and IBT is of no concern. And since FineIBT will have the ENDBR in the __cfi_\sym thing, \sym will not need it. But thinking about this now I suppose __nocfi call symbols will stlil need the ENDBR on. Objtool IBT validation would need to either find ENDBR or a matching __cfi_\sym I suppose. So I was talking to Joao on IRC the other day, and I realized that if kCFI generates code as per the above, then we can do FineIBT purely in-kernel. That is; have objtool generate a section of __cfi_\sym locations. Then use the .retpoline_sites and .cfi_sites to rewrite kCFI into the FineIBT form in multi pass: - read all the __cfi_\sym sites and collect all unique hash values - allocate (module) memory and write __fineibt_\hash functions for each unique hash value found - rewrite callers; nop out kCFI - rewrite all __cfi_\sym - rewrite all callers - enable IBT And the same on module load I suppose. But I've only thought about this, not actually implemented it, so who knows what surprises are lurking there :-) > Random extra thoughts... feel free to ignore. :) Given that both CFI > schemes depend on an attacker not being able to construct an executable > memory region that either starts with endbr (for FineIBT) or starts with > hash & 2 bytes (for kCFI), we should likely take another look at where > the kernel uses PAGE_KERNEL_EXEC. > > It seems non-specialized use is entirely done via module_alloc(). Obviously > modules need to stay as-is. So we're left with other module_alloc() > callers: BPF JIT, ftrace, and kprobes. > > Perhaps enabling CFI should tie bpf_jit_harden (which performs constant > blinding) to the value of bpf_jit_enable? (i.e. either use BPF VM which > reads from non-exec memory, or use BPF JIT with constant blinding.) > > I *think* all the kprobes and ftrace stuff ends up using constructed > direct calls, though, yes? So if we did bounds checking, we could > "exclude" them as well as the BPF JIT. Though I'm not sure how > controllable the content written to the kprobes and ftrace regions are, > though? Both ftrace and kprobe only write fairly simple tramplines based off of a template. Neither has indirect calls. > For exclusion, we could separate actual modules from the other > module_alloc() users by maybe allocating in opposite directions from the > randomized offset and check indirect calls against the kernel text bounds > and the new modules-only bounds. Sounds expensive, though. Maybe PKS, > but I can't imagine 2 MSR writes per indirect call would be fast. Hmm... I'm not sure what problem you're trying to solve..
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 4faac48ebec5..901b702fb16d 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -295,6 +295,7 @@ SYM_CODE_START(ret_from_fork) /* kernel thread */ UNWIND_HINT_EMPTY movq %r12, %rdi + FINEIBT_HASH(0x89f22991) CALL_NOSPEC rbx /* * A kernel thread is allowed to return here after successfully diff --git a/arch/x86/include/asm/ibt.h b/arch/x86/include/asm/ibt.h index 689880eca9ba..580aa8b83bb2 100644 --- a/arch/x86/include/asm/ibt.h +++ b/arch/x86/include/asm/ibt.h @@ -21,6 +21,16 @@ #define HAS_KERNEL_IBT 1 +#if defined(CONFIG_X86_KERNEL_FINEIBT) && !defined(BUILD_VDSO) && !defined(BUILD_VDSO32_64) +#define HAS_KERNEL_FINEIBT 1 +#define FINEIBT_HASH(hash) mov $hash, %r11d +#define __coarseendbr __attribute__((coarsecf_check)) +#else +#define HAS_KERNEL_FINEIBT 0 +#define FINEIBT_HASH(hash) +#define __coarseendbr +#endif + #ifndef __ASSEMBLY__ #ifdef CONFIG_X86_64 @@ -29,6 +39,7 @@ #define ASM_ENDBR "endbr32\n\t" #endif +#undef __noendbr #define __noendbr __attribute__((nocf_check)) static inline __attribute_const__ u32 gen_endbr(void) @@ -80,12 +91,17 @@ extern __noendbr void ibt_restore(u64 save); #else /* !IBT */ #define HAS_KERNEL_IBT 0 +#define HAS_KERNEL_FINEIBT 0 +#define FINEIBT_HASH(hash) #ifndef __ASSEMBLY__ #define ASM_ENDBR +#undef __noendbr #define __noendbr +#undef __coarseendbr +#define __coarseendbr static inline bool is_endbr(u32 val) { return false; } diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h index 896e48d45828..d2a2f6456403 100644 --- a/arch/x86/include/asm/setup.h +++ b/arch/x86/include/asm/setup.h @@ -49,10 +49,10 @@ extern unsigned long saved_video_mode; extern void reserve_standard_io_resources(void); extern void i386_reserve_resources(void); -extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp); -extern unsigned long __startup_secondary_64(void); -extern void startup_64_setup_env(unsigned long physbase); -extern void early_setup_idt(void); +extern unsigned long __coarseendbr __startup_64(unsigned long physaddr, struct boot_params *bp); +extern unsigned long __coarseendbr __startup_secondary_64(void); +extern void __coarseendbr startup_64_setup_env(unsigned long physbase); +extern void __coarseendbr early_setup_idt(void); extern void __init do_early_exception(struct pt_regs *regs, int trapnr); #ifdef CONFIG_X86_INTEL_MID @@ -137,8 +137,8 @@ extern void probe_roms(void); asmlinkage void __init i386_start_kernel(void); #else -asmlinkage void __init x86_64_start_kernel(char *real_mode); -asmlinkage void __init x86_64_start_reservations(char *real_mode_data); +asmlinkage void __init __coarseendbr x86_64_start_kernel(char *real_mode); +asmlinkage void __init __coarseendbr x86_64_start_reservations(char *real_mode_data); #endif /* __i386__ */ #endif /* _SETUP */ diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 68c257a3de0d..7f32ef8d23f0 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -49,7 +49,7 @@ static inline unsigned long __native_read_cr3(void) return val; } -static inline void native_write_cr3(unsigned long val) +static inline void __coarseendbr native_write_cr3(unsigned long val) { asm volatile("mov %0,%%cr3": : "r" (val) : "memory"); } @@ -74,7 +74,7 @@ static inline unsigned long native_read_cr4(void) return val; } -void native_write_cr4(unsigned long val); +void __coarseendbr native_write_cr4(unsigned long val); #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS static inline u32 rdpkru(void) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index ed4417500700..70e94194ee2b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -463,7 +463,7 @@ void native_write_cr0(unsigned long val) } EXPORT_SYMBOL(native_write_cr0); -void __no_profile native_write_cr4(unsigned long val) +void __no_profile __coarseendbr native_write_cr4(unsigned long val) { unsigned long bits_changed = 0; diff --git a/arch/x86/kernel/fineibt.c b/arch/x86/kernel/fineibt.c new file mode 100644 index 000000000000..685e4308d86e --- /dev/null +++ b/arch/x86/kernel/fineibt.c @@ -0,0 +1,123 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This file contains the FineIBT handler function. + */ + +#include <linux/export.h> +#include <linux/printk.h> +#include <linux/kernel.h> +#include<linux/spinlock.h> +#include <asm/ibt.h> + +void __noendbr __fineibt_handler(void); + +void __fineibt_debug(void) { + asm volatile ("nop\n"); + printk("fineibt debug\n"); +}; +EXPORT_SYMBOL(__fineibt_debug); + +#define FINEIBT_VADDR_LEN 4096 +#define DO_ALL_PUSHS \ + asm("nop\n\t" \ + "push %rsi\n\t" \ + "push %rdi\n\t" \ + "push %rdx\n\t" \ + "push %rcx\n\t" \ + "push %rbx\n\t" \ + "push %rax\n\t" \ + "push %r8\n\t" \ + "push %r9\n\t" \ + "push %r10\n\t" \ + "push %r11\n\t" \ + "push %r12\n\t" \ + "push %r13\n\t" \ + "push %r14\n\t" \ + "push %r15\n\t") + +#define DO_ALL_POPS \ + asm("nop\n\t" \ + "pop %r15\n\t" \ + "pop %r14\n\t" \ + "pop %r13\n\t" \ + "pop %r12\n\t" \ + "pop %r11\n\t" \ + "pop %r10\n\t" \ + "pop %r9\n\t" \ + "pop %r8\n\t" \ + "pop %rax\n\t" \ + "pop %rbx\n\t" \ + "pop %rcx\n\t" \ + "pop %rdx\n\t" \ + "pop %rdi\n\t" \ + "pop %rsi\n\t") + +struct fineibt_violation{ + void * vaddr; + void * caddr; + bool printed; +}; + +typedef struct fineibt_violation fineibt_violation; + +static fineibt_violation vlts[FINEIBT_VADDR_LEN]; +static unsigned long vlts_next = 0; +static bool vlts_initialize = true; +static DEFINE_SPINLOCK(fineibt_lock); + +void __noendbr __fineibt_handler(void){ + unsigned i; + unsigned long flags; + bool skip; + void * ret; + void * caller; + + DO_ALL_PUSHS; + + spin_lock_irqsave(&fineibt_lock, flags); + skip = false; + + asm("\t movq 0x90(%%rsp),%0" : "=r"(ret)); + asm("\t movq 0x98(%%rsp),%0" : "=r"(caller)); + + if(vlts_initialize){ + for(i = 0; i < FINEIBT_VADDR_LEN; i++) { + vlts[i].vaddr = 0; + vlts[i].caddr = 0; + vlts[i].printed = 0; + } + vlts_initialize = false; + } + + if(vlts_next >= FINEIBT_VADDR_LEN) { + if(vlts_next == FINEIBT_VADDR_LEN) { + printk("FineIBT reached max buffer\n"); + vlts_next++; + } + skip = true; + } + + for(i = 0; i < vlts_next; i++){ + if(vlts[i].vaddr == ret && vlts[i].caddr == caller) { + skip = true; + break; + } + } + + if(!skip) { + vlts[vlts_next].vaddr = ret; + vlts[vlts_next].caddr = caller; + vlts[vlts_next].printed = 0; + vlts_next = vlts_next + 1; + } + + spin_unlock_irqrestore(&fineibt_lock, flags); + + if(!skip) { + printk("FineIBT violation: %px:%px:%u\n", ret, caller, + vlts_next); + } + DO_ALL_POPS; +} + +EXPORT_SYMBOL(__fineibt_handler); diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 4f5ecbbaae77..f773d771e07d 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -162,7 +162,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv * boot-time crashes. To work around this problem, every global pointer must * be adjusted using fixup_pointer(). */ -unsigned long __head __startup_64(unsigned long physaddr, +unsigned long __head __coarseendbr __startup_64(unsigned long physaddr, struct boot_params *bp) { unsigned long load_delta, *p; @@ -308,7 +308,7 @@ unsigned long __head __startup_64(unsigned long physaddr, return sme_postprocess_startup(bp, pmd); } -unsigned long __startup_secondary_64(void) +unsigned long __coarseendbr __startup_secondary_64(void) { /* * Return the SME encryption mask (if SME is active) to be used as a @@ -464,8 +464,8 @@ static void __init copy_bootdata(char *real_mode_data) sme_unmap_bootdata(real_mode_data); } -asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data) -{ +asmlinkage __visible void __init __coarseendbr +x86_64_start_kernel(char * real_mode_data) { /* * Build-time sanity checks on the kernel image and module * area mappings. (these are purely build-time and produce no code) @@ -597,7 +597,7 @@ static void startup_64_load_idt(unsigned long physbase) } /* This is used when running on kernel addresses */ -void early_setup_idt(void) +void __coarseendbr early_setup_idt(void) { /* VMM Communication Exception */ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) @@ -610,7 +610,7 @@ void early_setup_idt(void) /* * Setup boot CPU state needed before kernel switches to virtual addresses. */ -void __head startup_64_setup_env(unsigned long physbase) +void __head __coarseendbr startup_64_setup_env(unsigned long physbase) { /* Load GDT */ startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 2ef14772dc04..5fa17d5716bb 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -214,7 +214,7 @@ static int enable_start_cpu0; /* * Activate a secondary processor. */ -static void notrace start_secondary(void *unused) +static void notrace __coarseendbr start_secondary(void *unused) { /* * Don't put *anything* except direct CPU state initialization