Message ID | 20230808102049.465864-3-elver@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,1/3] compiler_types: Introduce the Clang __preserve_most function attribute | expand |
On Tue, Aug 08, 2023 at 12:17:27PM +0200, Marco Elver wrote: > Numerous production kernel configs (see [1, 2]) are choosing to enable > CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened > configs [3]. The feature has never been designed with performance in > mind, yet common list manipulation is happening across hot paths all > over the kernel. > > Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer > checking inline, and only upon list corruption delegates to the > reporting slow path. I'd really like to get away from calling this "DEBUG", since it's used more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time making this better a while back, but the series never landed. Do you have a bit of time to look through it? https://github.com/KSPP/linux/issues/10 https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/ -Kees
On Tue, 8 Aug 2023 at 23:27, Kees Cook <keescook@chromium.org> wrote: > > On Tue, Aug 08, 2023 at 12:17:27PM +0200, Marco Elver wrote: > > Numerous production kernel configs (see [1, 2]) are choosing to enable > > CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened > > configs [3]. The feature has never been designed with performance in > > mind, yet common list manipulation is happening across hot paths all > > over the kernel. > > > > Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer > > checking inline, and only upon list corruption delegates to the > > reporting slow path. > > I'd really like to get away from calling this "DEBUG", since it's used > more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time > making this better a while back, but the series never landed. Do you > have a bit of time to look through it? > > https://github.com/KSPP/linux/issues/10 > https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/ I'm fine renaming this one. But there are other issues that Will's series solves, which I don't want this series to depend on. We can try to sort them out separately. The main problem here is that DEBUG_LIST has been designed to be friendly for debugging (incl. checking poison values and NULL). Some kernel devs may still want that, but for production use is pointless and wasteful. So what I can propose is to introduce CONFIG_LIST_HARDENED that doesn't depend on CONFIG_DEBUG_LIST, but instead selects it, because we still use that code to produce a report. If there are other list types that have similar debug checks, but where we can optimize performance by eliding some and moving them inline, we can do the same (CONFIG_*_HARDENED). If the checks are already optimized, we could just rename it to CONFIG_*_HARDENED and remove the DEBUG-name. I'm not a big fan of the 2 modes either, but that's what we get if we want to support the debugging and hardening usecases. Inlining the full set of checks does not work for performance and size.
On Wed, Aug 09, 2023 at 09:35AM +0200, Marco Elver wrote: > > I'd really like to get away from calling this "DEBUG", since it's used > > more for hardening (CONFIG_LIST_HARDENED?). Will Deacon spent some time > > making this better a while back, but the series never landed. Do you > > have a bit of time to look through it? > > > > https://github.com/KSPP/linux/issues/10 > > https://lore.kernel.org/lkml/20200324153643.15527-1-will@kernel.org/ > > I'm fine renaming this one. But there are other issues that Will's > series solves, which I don't want this series to depend on. We can try > to sort them out separately. > > The main problem here is that DEBUG_LIST has been designed to be > friendly for debugging (incl. checking poison values and NULL). Some > kernel devs may still want that, but for production use is pointless > and wasteful. > > So what I can propose is to introduce CONFIG_LIST_HARDENED that > doesn't depend on CONFIG_DEBUG_LIST, but instead selects it, because > we still use that code to produce a report. How about the below? We'll add CONFIG_HARDEN_LIST (in Kconfig.hardening), which is independent of CONFIG_DEBUG_LIST. For the implementation it selects DEBUG_LIST, but irrelevant for users. This will get us the best of both worlds: a version for hardening that should remain as fast as possible, and one for debugging with better reports. ------ >8 ------ From: Marco Elver <elver@google.com> Date: Thu, 27 Jul 2023 22:19:02 +0200 Subject: [PATCH v4 3/3] list: Introduce CONFIG_HARDEN_LIST Numerous production kernel configs (see [1, 2]) are choosing to enable CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened configs [3]. The motivation behind this is that the option can be used as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025 are mitigated by the option [4]). The feature has never been designed with performance in mind, yet common list manipulation is happening across hot paths all over the kernel. Introduce CONFIG_HARDEN_LIST, which performs list pointer checking inline, and only upon list corruption calls the reporting slow path. To generate optimal machine code with CONFIG_HARDEN_LIST: 1. Elide checking for pointer values which upon dereference would result in an immediate access fault -- therefore "minimal" checks. The trade-off is lower-quality error reports. 2. Use the newly introduced __preserve_most function attribute (available with Clang, but not yet with GCC) to minimize the code footprint for calling the reporting slow path. As a result, function size of callers is reduced by avoiding saving registers before calling the rarely called reporting slow path. Note that all TUs in lib/Makefile already disable function tracing, including list_debug.c, and __preserve_most's implied notrace has no effect in this case. 3. Because the inline checks are a subset of the full set of checks in __list_*_valid_or_report(), always return false if the inline checks failed. This avoids redundant compare and conditional branch right after return from the slow path. As a side-effect of the checks being inline, if the compiler can prove some condition to always be true, it can completely elide some checks. Running netperf with CONFIG_HARDEN_LIST (using a Clang compiler with "preserve_most") shows throughput improvements, in my case of ~7% on average (up to 20-30% on some test cases). Link: https://r.android.com/1266735 [1] Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2] Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3] Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4] Signed-off-by: Marco Elver <elver@google.com> --- v4: * Rename to CONFIG_HARDEN_LIST, which can independently be selected from CONFIG_DEBUG_LIST. v3: * Rename ___list_*_valid() to __list_*_valid_or_report(). * More comments. v2: * Note that lib/Makefile disables function tracing for everything and __preserve_most's implied notrace is a noop here. --- arch/arm64/kvm/hyp/nvhe/list_debug.c | 2 + include/linux/list.h | 64 +++++++++++++++++++++++++--- lib/Kconfig.debug | 12 ++++-- lib/list_debug.c | 2 + security/Kconfig.hardening | 14 ++++++ 5 files changed, 84 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c index 16266a939a4c..46a2d4f2b3c6 100644 --- a/arch/arm64/kvm/hyp/nvhe/list_debug.c +++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c @@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v) /* The predicates checked here are taken from lib/list_debug.c. */ +__list_valid_slowpath bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, struct list_head *next) { @@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, return true; } +__list_valid_slowpath bool __list_del_entry_valid_or_report(struct list_head *entry) { struct list_head *prev, *next; diff --git a/include/linux/list.h b/include/linux/list.h index 130c6a1bb45c..1c7f70b7cc7a 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -39,38 +39,90 @@ static inline void INIT_LIST_HEAD(struct list_head *list) } #ifdef CONFIG_DEBUG_LIST + +#ifdef CONFIG_HARDEN_LIST +# define __list_valid_slowpath __cold __preserve_most +#else +# define __list_valid_slowpath +#endif + /* * Performs the full set of list corruption checks before __list_add(). * On list corruption reports a warning, and returns false. */ -extern bool __list_add_valid_or_report(struct list_head *new, - struct list_head *prev, - struct list_head *next); +extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new, + struct list_head *prev, + struct list_head *next); /* * Performs list corruption checks before __list_add(). Returns false if a * corruption is detected, true otherwise. + * + * With CONFIG_HARDEN_LIST set, performs minimal list integrity checking (that + * do not result in a fault) inline, and only if a corruption is detected calls + * the reporting function __list_add_valid_or_report(). */ static __always_inline bool __list_add_valid(struct list_head *new, struct list_head *prev, struct list_head *next) { - return __list_add_valid_or_report(new, prev, next); + bool ret = true; + + if (IS_ENABLED(CONFIG_HARDEN_LIST)) { + /* + * With the hardening version, elide checking if next and prev + * are NULL, since the immediate dereference of them below would + * result in a fault if NULL. + * + * With the reduced set of checks, we can afford to inline the + * checks, which also gives the compiler a chance to elide some + * of them completely if they can be proven at compile-time. If + * one of the pre-conditions does not hold, the slow-path will + * show a report which pre-condition failed. + */ + if (likely(next->prev == prev && prev->next == next && new != prev && new != next)) + return true; + ret = false; + } + + ret &= __list_add_valid_or_report(new, prev, next); + return ret; } /* * Performs the full set of list corruption checks before __list_del_entry(). * On list corruption reports a warning, and returns false. */ -extern bool __list_del_entry_valid_or_report(struct list_head *entry); +extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry); /* * Performs list corruption checks before __list_del_entry(). Returns false if a * corruption is detected, true otherwise. + * + * With CONFIG_HARDEN_LIST set, performs minimal list integrity checking (that + * do not result in a fault) inline, and only if a corruption is detected calls + * the reporting function __list_del_entry_valid_or_report(). */ static __always_inline bool __list_del_entry_valid(struct list_head *entry) { - return __list_del_entry_valid_or_report(entry); + bool ret = true; + + if (IS_ENABLED(CONFIG_HARDEN_LIST)) { + struct list_head *prev = entry->prev; + struct list_head *next = entry->next; + + /* + * With the hardening version, elide checking if next and prev + * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate + * dereference of them below would result in a fault. + */ + if (likely(prev->next == entry && next->prev == entry)) + return true; + ret = false; + } + + ret &= __list_del_entry_valid_or_report(entry); + return ret; } #else static inline bool __list_add_valid(struct list_head *new, diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index fbc89baf7de6..6b0de78fb2da 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1672,11 +1672,15 @@ config HAVE_DEBUG_BUGVERBOSE menu "Debug kernel data structures" config DEBUG_LIST - bool "Debug linked list manipulation" - depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION + bool "Debug linked list manipulation" if !HARDEN_LIST + depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION || HARDEN_LIST help - Enable this to turn on extended checks in the linked-list - walking routines. + Enable this to turn on extended checks in the linked-list walking + routines. + + If you care about performance, you should enable CONFIG_HARDEN_LIST + instead. This option alone trades better quality error reports for + worse performance, and is more suitable for debugging. If unsure, say N. diff --git a/lib/list_debug.c b/lib/list_debug.c index 2def33b1491f..0ff547910dd0 100644 --- a/lib/list_debug.c +++ b/lib/list_debug.c @@ -17,6 +17,7 @@ * attempt). */ +__list_valid_slowpath bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, struct list_head *next) { @@ -39,6 +40,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, } EXPORT_SYMBOL(__list_add_valid_or_report); +__list_valid_slowpath bool __list_del_entry_valid_or_report(struct list_head *entry) { struct list_head *prev, *next; diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening index 0f295961e773..a8aef895f13d 100644 --- a/security/Kconfig.hardening +++ b/security/Kconfig.hardening @@ -279,6 +279,20 @@ config ZERO_CALL_USED_REGS endmenu +menu "Hardening of kernel data structures" + +config HARDEN_LIST + bool "Check integrity of linked list manipulation" + select DEBUG_LIST + help + Minimal integrity checking in the linked-list manipulation routines + to catch memory corruptions that are not guaranteed to result in an + immediate access fault. + + If unsure, say N. + +endmenu + config CC_HAS_RANDSTRUCT def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null) # Randstruct was first added in Clang 15, but it isn't safe to use until
On Wed, 9 Aug 2023 11:57:19 +0200 Marco Elver <elver@google.com> wrote: > static __always_inline bool __list_add_valid(struct list_head *new, > struct list_head *prev, > struct list_head *next) > { > - return __list_add_valid_or_report(new, prev, next); > + bool ret = true; > + > + if (IS_ENABLED(CONFIG_HARDEN_LIST)) { > + /* > + * With the hardening version, elide checking if next and prev > + * are NULL, since the immediate dereference of them below would > + * result in a fault if NULL. > + * > + * With the reduced set of checks, we can afford to inline the > + * checks, which also gives the compiler a chance to elide some > + * of them completely if they can be proven at compile-time. If > + * one of the pre-conditions does not hold, the slow-path will > + * show a report which pre-condition failed. > + */ > + if (likely(next->prev == prev && prev->next == next && new != prev && new != next)) > + return true; > + ret = false; > + } > + > + ret &= __list_add_valid_or_report(new, prev, next); > + return ret; > } I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other way around. It logically doesn't make sense that HARDEN_LIST would select DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not DEBUG_LIST (because who knows, it may add other features I don't want). But then, I may have stumbled over something and want more info, and enable DEBUG_LIST (while still having HARDEN_LIST) enabled. I think you are looking at this from an implementation perspective and not the normal developer one. This would mean the above function should get enabled by CONFIG_HARDEN_LIST (and CONFIG_DEBUG would select CONFIG_HARDEN) and would look more like: static __always_inline bool __list_add_valid(struct list_head *new, struct list_head *prev, struct list_head *next) { bool ret = true; if (!IS_ENABLED(CONFIG_DEBUG_LIST)) { /* * With the hardening version, elide checking if next and prev * are NULL, since the immediate dereference of them below would * result in a fault if NULL. * * With the reduced set of checks, we can afford to inline the * checks, which also gives the compiler a chance to elide some * of them completely if they can be proven at compile-time. If * one of the pre-conditions does not hold, the slow-path will * show a report which pre-condition failed. */ if (likely(next->prev == prev && prev->next == next && new != prev && new != next)) return true; ret = false; } ret &= __list_add_valid_or_report(new, prev, next); return ret; } That is, if DEBUG_LIST is enabled, we always call the __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we do the shortcut. -- Steve
On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote: [...] > > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other > way around. It logically doesn't make sense that HARDEN_LIST would select > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not > DEBUG_LIST (because who knows, it may add other features I don't want). But > then, I may have stumbled over something and want more info, and enable > DEBUG_LIST (while still having HARDEN_LIST) enabled. > > I think you are looking at this from an implementation perspective and not > the normal developer one. > [...] > > That is, if DEBUG_LIST is enabled, we always call the > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we > do the shortcut. Good point - I think this is better. See below tentative v4. Kees: Does that also look more like what you had in mind? Thanks, -- Marco ------ >8 ------ From: Marco Elver <elver@google.com> Date: Thu, 27 Jul 2023 22:19:02 +0200 Subject: [PATCH] list: Introduce CONFIG_HARDEN_LIST Numerous production kernel configs (see [1, 2]) are choosing to enable CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened configs [3]. The motivation behind this is that the option can be used as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025 are mitigated by the option [4]). The feature has never been designed with performance in mind, yet common list manipulation is happening across hot paths all over the kernel. Introduce CONFIG_HARDEN_LIST, which performs list pointer checking inline, and only upon list corruption calls the reporting slow path. Since DEBUG_LIST is functionally a superset of HARDEN_LIST, the Kconfig variables are designed to reflect that: DEBUG_LIST selects HARDEN_LIST, whereas HARDEN_LIST itself has no dependency on DEBUG_LIST. To generate optimal machine code with CONFIG_HARDEN_LIST: 1. Elide checking for pointer values which upon dereference would result in an immediate access fault -- therefore "minimal" checks. The trade-off is lower-quality error reports. 2. Use the newly introduced __preserve_most function attribute (available with Clang, but not yet with GCC) to minimize the code footprint for calling the reporting slow path. As a result, function size of callers is reduced by avoiding saving registers before calling the rarely called reporting slow path. Note that all TUs in lib/Makefile already disable function tracing, including list_debug.c, and __preserve_most's implied notrace has no effect in this case. 3. Because the inline checks are a subset of the full set of checks in __list_*_valid_or_report(), always return false if the inline checks failed. This avoids redundant compare and conditional branch right after return from the slow path. As a side-effect of the checks being inline, if the compiler can prove some condition to always be true, it can completely elide some checks. Running netperf with CONFIG_HARDEN_LIST (using a Clang compiler with "preserve_most") shows throughput improvements, in my case of ~7% on average (up to 20-30% on some test cases). Link: https://r.android.com/1266735 [1] Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2] Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3] Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4] Signed-off-by: Marco Elver <elver@google.com> --- v4: * Rename to CONFIG_HARDEN_LIST, which can independently be selected from CONFIG_DEBUG_LIST. v3: * Rename ___list_*_valid() to __list_*_valid_or_report(). * More comments. v2: * Note that lib/Makefile disables function tracing for everything and __preserve_most's implied notrace is a noop here. --- arch/arm64/kvm/hyp/nvhe/Makefile | 2 +- arch/arm64/kvm/hyp/nvhe/list_debug.c | 2 + include/linux/list.h | 64 +++++++++++++++++++++++++--- lib/Kconfig.debug | 9 +++- lib/Makefile | 2 +- lib/list_debug.c | 5 ++- security/Kconfig.hardening | 13 ++++++ 7 files changed, 86 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile index 9ddc025e4b86..c89c85a41ac4 100644 --- a/arch/arm64/kvm/hyp/nvhe/Makefile +++ b/arch/arm64/kvm/hyp/nvhe/Makefile @@ -25,7 +25,7 @@ hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \ ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o -hyp-obj-$(CONFIG_DEBUG_LIST) += list_debug.o +hyp-obj-$(CONFIG_HARDEN_LIST) += list_debug.o hyp-obj-y += $(lib-objs) ## diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c index 16266a939a4c..46a2d4f2b3c6 100644 --- a/arch/arm64/kvm/hyp/nvhe/list_debug.c +++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c @@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v) /* The predicates checked here are taken from lib/list_debug.c. */ +__list_valid_slowpath bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, struct list_head *next) { @@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, return true; } +__list_valid_slowpath bool __list_del_entry_valid_or_report(struct list_head *entry) { struct list_head *prev, *next; diff --git a/include/linux/list.h b/include/linux/list.h index 130c6a1bb45c..ef899c27c68b 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -38,39 +38,91 @@ static inline void INIT_LIST_HEAD(struct list_head *list) WRITE_ONCE(list->prev, list); } +#ifdef CONFIG_HARDEN_LIST + #ifdef CONFIG_DEBUG_LIST +# define __list_valid_slowpath +#else +# define __list_valid_slowpath __cold __preserve_most +#endif + /* * Performs the full set of list corruption checks before __list_add(). * On list corruption reports a warning, and returns false. */ -extern bool __list_add_valid_or_report(struct list_head *new, - struct list_head *prev, - struct list_head *next); +extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new, + struct list_head *prev, + struct list_head *next); /* * Performs list corruption checks before __list_add(). Returns false if a * corruption is detected, true otherwise. + * + * With CONFIG_HARDEN_LIST only, performs minimal list integrity checking (that + * do not result in a fault) inline, and only if a corruption is detected calls + * the reporting function __list_add_valid_or_report(). */ static __always_inline bool __list_add_valid(struct list_head *new, struct list_head *prev, struct list_head *next) { - return __list_add_valid_or_report(new, prev, next); + bool ret = true; + + if (!IS_ENABLED(CONFIG_DEBUG_LIST)) { + /* + * With the hardening version, elide checking if next and prev + * are NULL, since the immediate dereference of them below would + * result in a fault if NULL. + * + * With the reduced set of checks, we can afford to inline the + * checks, which also gives the compiler a chance to elide some + * of them completely if they can be proven at compile-time. If + * one of the pre-conditions does not hold, the slow-path will + * show a report which pre-condition failed. + */ + if (likely(next->prev == prev && prev->next == next && new != prev && new != next)) + return true; + ret = false; + } + + ret &= __list_add_valid_or_report(new, prev, next); + return ret; } /* * Performs the full set of list corruption checks before __list_del_entry(). * On list corruption reports a warning, and returns false. */ -extern bool __list_del_entry_valid_or_report(struct list_head *entry); +extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry); /* * Performs list corruption checks before __list_del_entry(). Returns false if a * corruption is detected, true otherwise. + * + * With CONFIG_HARDEN_LIST only, performs minimal list integrity checking (that + * do not result in a fault) inline, and only if a corruption is detected calls + * the reporting function __list_del_entry_valid_or_report(). */ static __always_inline bool __list_del_entry_valid(struct list_head *entry) { - return __list_del_entry_valid_or_report(entry); + bool ret = true; + + if (!IS_ENABLED(CONFIG_DEBUG_LIST)) { + struct list_head *prev = entry->prev; + struct list_head *next = entry->next; + + /* + * With the hardening version, elide checking if next and prev + * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate + * dereference of them below would result in a fault. + */ + if (likely(prev->next == entry && next->prev == entry)) + return true; + ret = false; + } + + ret &= __list_del_entry_valid_or_report(entry); + return ret; } #else static inline bool __list_add_valid(struct list_head *new, diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index fbc89baf7de6..9e38956d6f50 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1674,9 +1674,14 @@ menu "Debug kernel data structures" config DEBUG_LIST bool "Debug linked list manipulation" depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION + select HARDEN_LIST help - Enable this to turn on extended checks in the linked-list - walking routines. + Enable this to turn on extended checks in the linked-list walking + routines. + + This option trades better quality error reports for performance, and + is more suitable for kernel debugging. If you care about performance, + you should only enable CONFIG_HARDEN_LIST instead. If unsure, say N. diff --git a/lib/Makefile b/lib/Makefile index 42d307ade225..a7ebc9090f04 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -161,7 +161,7 @@ obj-$(CONFIG_BTREE) += btree.o obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o -obj-$(CONFIG_DEBUG_LIST) += list_debug.o +obj-$(CONFIG_HARDEN_LIST) += list_debug.o obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o obj-$(CONFIG_BITREVERSE) += bitrev.o diff --git a/lib/list_debug.c b/lib/list_debug.c index 2def33b1491f..38ddc7c01eab 100644 --- a/lib/list_debug.c +++ b/lib/list_debug.c @@ -2,7 +2,8 @@ * Copyright 2006, Red Hat, Inc., Dave Jones * Released under the General Public License (GPL). * - * This file contains the linked list validation for DEBUG_LIST. + * This file contains the linked list validation and error reporting for + * HARDEN_LIST and DEBUG_LIST. */ #include <linux/export.h> @@ -17,6 +18,7 @@ * attempt). */ +__list_valid_slowpath bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, struct list_head *next) { @@ -39,6 +41,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, } EXPORT_SYMBOL(__list_add_valid_or_report); +__list_valid_slowpath bool __list_del_entry_valid_or_report(struct list_head *entry) { struct list_head *prev, *next; diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening index 0f295961e773..19aa2b7d2f64 100644 --- a/security/Kconfig.hardening +++ b/security/Kconfig.hardening @@ -279,6 +279,19 @@ config ZERO_CALL_USED_REGS endmenu +menu "Hardening of kernel data structures" + +config HARDEN_LIST + bool "Check integrity of linked list manipulation" + help + Minimal integrity checking in the linked-list manipulation routines + to catch memory corruptions that are not guaranteed to result in an + immediate access fault. + + If unsure, say N. + +endmenu + config CC_HAS_RANDSTRUCT def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null) # Randstruct was first added in Clang 15, but it isn't safe to use until
On Wed, Aug 09, 2023 at 06:32:37PM +0200, Marco Elver wrote: > On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote: > [...] > > > > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other > > way around. It logically doesn't make sense that HARDEN_LIST would select > > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not > > DEBUG_LIST (because who knows, it may add other features I don't want). But > > then, I may have stumbled over something and want more info, and enable > > DEBUG_LIST (while still having HARDEN_LIST) enabled. > > > > I think you are looking at this from an implementation perspective and not > > the normal developer one. > > > [...] > > > > That is, if DEBUG_LIST is enabled, we always call the > > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we > > do the shortcut. > > Good point - I think this is better. See below tentative v4. > > Kees: Does that also look more like what you had in mind? Yeah, this looks good. My only nit would be a naming one. All the other hardening features are named "HARDENED", but perhaps the "ED" is redundant in the others. Still, consistency seems nicer. What do you think of CONFIG_LIST_HARDENED ? (The modern trend for Kconfig naming tends to keep the subsystem name first and then apply optional elements after.) One note: do the LKDTM list hardening tests still pass? i.e. CORRUPT_LIST_ADD CORRUPT_LIST_DEL > [...] > + /* > + * With the hardening version, elide checking if next and prev > + * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate > + * dereference of them below would result in a fault. > + */ > + if (likely(prev->next == entry && next->prev == entry)) > + return true; I'm not super excited about skipping those checks, since they are values that can be reached through kernel list management confusion. If an attacker is using a system where the zero-page has been mapped and is accessible (i.e. lacking SMAP etc), then attacks could still be constructed. However, I do recognize this chain of exploitation prerequisites is getting rather long, so probably this is a reasonable trade off on modern systems. -Kees
On Thu, 10 Aug 2023 at 22:12, Kees Cook <keescook@chromium.org> wrote: > > On Wed, Aug 09, 2023 at 06:32:37PM +0200, Marco Elver wrote: > > On Wed, Aug 09, 2023 at 11:30AM -0400, Steven Rostedt wrote: > > [...] > > > > > > I would actually prefer DEBUG_LIST to select HARDEN_LIST and not the other > > > way around. It logically doesn't make sense that HARDEN_LIST would select > > > DEBUG_LIST. That is, I could by default want HARDEN_LIST always on, but not > > > DEBUG_LIST (because who knows, it may add other features I don't want). But > > > then, I may have stumbled over something and want more info, and enable > > > DEBUG_LIST (while still having HARDEN_LIST) enabled. > > > > > > I think you are looking at this from an implementation perspective and not > > > the normal developer one. > > > > > [...] > > > > > > That is, if DEBUG_LIST is enabled, we always call the > > > __list_add_valid_or_report(), but if only HARDEN_LIST is enabled, then we > > > do the shortcut. > > > > Good point - I think this is better. See below tentative v4. > > > > Kees: Does that also look more like what you had in mind? > > Yeah, this looks good. My only nit would be a naming one. All the > other hardening features are named "HARDENED", but perhaps the "ED" > is redundant in the others. Still, consistency seems nicer. What do you > think of CONFIG_LIST_HARDENED ? (The modern trend for Kconfig naming tends > to keep the subsystem name first and then apply optional elements after.) Naming is a bit all over. :-/ I agree with the <subsystem>_<suboption> scheme, generally. I think initially I tried to keep the name shorter, and also find a good counter-part to DEBUG_<suboption>, therefore HARDEN_LIST. Let's just change it to CONFIG_LIST_HARDENED, given the existing "HARDENED" options. I don't have a strong preference. > One note: do the LKDTM list hardening tests still pass? i.e. > CORRUPT_LIST_ADD > CORRUPT_LIST_DEL Yes, they do. Though I need to also adjust BUG_ON_DATA_CORRUPTION to select LIST_HARDENED, and the test should check for the new option (which is implied by DEBUG_LIST now). There will be an additional patch to adjust that. > > [...] > > + /* > > + * With the hardening version, elide checking if next and prev > > + * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate > > + * dereference of them below would result in a fault. > > + */ > > + if (likely(prev->next == entry && next->prev == entry)) > > + return true; > > I'm not super excited about skipping those checks, since they are > values that can be reached through kernel list management confusion. If > an attacker is using a system where the zero-page has been mapped > and is accessible (i.e. lacking SMAP etc), then attacks could still > be constructed. However, I do recognize this chain of exploitation > prerequisites is getting rather long, so probably this is a reasonable > trade off on modern systems. Sure, it's a trade-off for systems which do have the bare minimum of modern hardware security features.
On Thu, 10 Aug 2023 13:11:58 -0700 Kees Cook <keescook@chromium.org> wrote: > > [...] > > + /* > > + * With the hardening version, elide checking if next and prev > > + * are NULL, LIST_POISON1 or LIST_POISON2, since the immediate > > + * dereference of them below would result in a fault. > > + */ > > + if (likely(prev->next == entry && next->prev == entry)) > > + return true; > > I'm not super excited about skipping those checks, since they are > values that can be reached through kernel list management confusion. If > an attacker is using a system where the zero-page has been mapped > and is accessible (i.e. lacking SMAP etc), then attacks could still > be constructed. However, I do recognize this chain of exploitation > prerequisites is getting rather long, so probably this is a reasonable > trade off on modern systems. A totally hardened machine is one that doesn't run ;-) Yes, hopefully that when the kernel is configured with HARDENED it will eliminate steps to a prerequisite attack. I'm sure enabling lockdep would also help harden the system too. But there is a balance between security and performance. The more that adding security harms performance, the less people will use that security. -- Steve
diff --git a/arch/arm64/kvm/hyp/nvhe/list_debug.c b/arch/arm64/kvm/hyp/nvhe/list_debug.c index 16266a939a4c..46a2d4f2b3c6 100644 --- a/arch/arm64/kvm/hyp/nvhe/list_debug.c +++ b/arch/arm64/kvm/hyp/nvhe/list_debug.c @@ -26,6 +26,7 @@ static inline __must_check bool nvhe_check_data_corruption(bool v) /* The predicates checked here are taken from lib/list_debug.c. */ +__list_valid_slowpath bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, struct list_head *next) { @@ -37,6 +38,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, return true; } +__list_valid_slowpath bool __list_del_entry_valid_or_report(struct list_head *entry) { struct list_head *prev, *next; diff --git a/include/linux/list.h b/include/linux/list.h index 130c6a1bb45c..066fe33e99bf 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -39,38 +39,90 @@ static inline void INIT_LIST_HEAD(struct list_head *list) } #ifdef CONFIG_DEBUG_LIST + +#ifdef CONFIG_DEBUG_LIST_MINIMAL +# define __list_valid_slowpath __cold __preserve_most +#else +# define __list_valid_slowpath +#endif + /* * Performs the full set of list corruption checks before __list_add(). * On list corruption reports a warning, and returns false. */ -extern bool __list_add_valid_or_report(struct list_head *new, - struct list_head *prev, - struct list_head *next); +extern bool __list_valid_slowpath __list_add_valid_or_report(struct list_head *new, + struct list_head *prev, + struct list_head *next); /* * Performs list corruption checks before __list_add(). Returns false if a * corruption is detected, true otherwise. + * + * With CONFIG_DEBUG_LIST_MINIMAL set, performs minimal list integrity checking + * (that do not result in a fault) inline, and only if a corruption is detected + * calls the reporting function __list_add_valid_or_report(). */ static __always_inline bool __list_add_valid(struct list_head *new, struct list_head *prev, struct list_head *next) { - return __list_add_valid_or_report(new, prev, next); + bool ret = true; + + if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) { + /* + * In the minimal config, elide checking if next and prev are + * NULL, since the immediate dereference of them below would + * result in a fault if NULL. + * + * With the minimal config we can afford to inline the checks, + * which also gives the compiler a chance to elide some of them + * completely if they can be proven at compile-time. If one of + * the pre-conditions does not hold, the slow-path will show a + * report which pre-condition failed. + */ + if (likely(next->prev == prev && prev->next == next && new != prev && new != next)) + return true; + ret = false; + } + + ret &= __list_add_valid_or_report(new, prev, next); + return ret; } /* * Performs the full set of list corruption checks before __list_del_entry(). * On list corruption reports a warning, and returns false. */ -extern bool __list_del_entry_valid_or_report(struct list_head *entry); +extern bool __list_valid_slowpath __list_del_entry_valid_or_report(struct list_head *entry); /* * Performs list corruption checks before __list_del_entry(). Returns false if a * corruption is detected, true otherwise. + * + * With CONFIG_DEBUG_LIST_MINIMAL set, performs minimal list integrity checking + * (that do not result in a fault) inline, and only if a corruption is detected + * calls the reporting function __list_del_entry_valid_or_report(). */ static __always_inline bool __list_del_entry_valid(struct list_head *entry) { - return __list_del_entry_valid_or_report(entry); + bool ret = true; + + if (IS_ENABLED(CONFIG_DEBUG_LIST_MINIMAL)) { + struct list_head *prev = entry->prev; + struct list_head *next = entry->next; + + /* + * In the minimal config, elide checking if next and prev are + * NULL, LIST_POISON1 or LIST_POISON2, since the immediate + * dereference of them below would result in a fault. + */ + if (likely(prev->next == entry && next->prev == entry)) + return true; + ret = false; + } + + ret &= __list_del_entry_valid_or_report(entry); + return ret; } #else static inline bool __list_add_valid(struct list_head *new, diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index fbc89baf7de6..e72cf08af0fa 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1680,6 +1680,21 @@ config DEBUG_LIST If unsure, say N. +config DEBUG_LIST_MINIMAL + bool "Minimal linked list debug checks" + default !DEBUG_KERNEL + depends on DEBUG_LIST + help + Only perform the minimal set of checks in the linked-list walking + routines to catch corruptions that are not guaranteed to result in an + immediate access fault. + + This trades lower quality error reports for improved performance: the + generated code should be more optimal and provide trade-offs that may + better serve safety- and performance- critical environments. + + If unsure, say Y. + config DEBUG_PLIST bool "Debug priority linked list manipulation" depends on DEBUG_KERNEL diff --git a/lib/list_debug.c b/lib/list_debug.c index 2def33b1491f..0ff547910dd0 100644 --- a/lib/list_debug.c +++ b/lib/list_debug.c @@ -17,6 +17,7 @@ * attempt). */ +__list_valid_slowpath bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, struct list_head *next) { @@ -39,6 +40,7 @@ bool __list_add_valid_or_report(struct list_head *new, struct list_head *prev, } EXPORT_SYMBOL(__list_add_valid_or_report); +__list_valid_slowpath bool __list_del_entry_valid_or_report(struct list_head *entry) { struct list_head *prev, *next;
Numerous production kernel configs (see [1, 2]) are choosing to enable CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened configs [3]. The feature has never been designed with performance in mind, yet common list manipulation is happening across hot paths all over the kernel. Introduce CONFIG_DEBUG_LIST_MINIMAL, which performs list pointer checking inline, and only upon list corruption delegates to the reporting slow path. To generate optimal machine code with CONFIG_DEBUG_LIST_MINIMAL: 1. Elide checking for pointer values which upon dereference would result in an immediate access fault -- therefore "minimal" checks. The trade-off is lower-quality error reports. 2. Use the newly introduced __preserve_most function attribute (available with Clang, but not yet with GCC) to minimize the code footprint for calling the reporting slow path. As a result, function size of callers is reduced by avoiding saving registers before calling the rarely called reporting slow path. Note that all TUs in lib/Makefile already disable function tracing, including list_debug.c, and __preserve_most's implied notrace has no effect in this case. 3. Because the inline checks are a subset of the full set of checks in ___list_*_valid(), always return false if the inline checks failed. This avoids redundant compare and conditional branch right after return from the slow path. As a side-effect of the checks being inline, if the compiler can prove some condition to always be true, it can completely elide some checks. Running netperf with CONFIG_DEBUG_LIST_MINIMAL (using a Clang compiler with "preserve_most") shows throughput improvements, in my case of ~7% on average (up to 20-30% on some test cases). Link: https://r.android.com/1266735 [1] Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2] Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3] Signed-off-by: Marco Elver <elver@google.com> --- v3: * Rename ___list_*_valid() to __list_*_valid_or_report(). * More comments. v2: * Note that lib/Makefile disables function tracing for everything and __preserve_most's implied notrace is a noop here. --- arch/arm64/kvm/hyp/nvhe/list_debug.c | 2 + include/linux/list.h | 64 +++++++++++++++++++++++++--- lib/Kconfig.debug | 15 +++++++ lib/list_debug.c | 2 + 4 files changed, 77 insertions(+), 6 deletions(-)