Message ID | 20200515172756.27185-5-will@kernel.org (mailing list archive) |
---|---|
State | Mainlined |
Commit | 88485be531f4aee841ddc53b56e2f6e6a338854d |
Headers | show |
Series | Clean up Shadow Call Stack patches for 5.8 | expand |
On Fri, May 15, 2020 at 06:27:54PM +0100, Will Deacon wrote: > There is nothing architecture-specific about scs_overflow_check() as > it's just a trivial wrapper around scs_corrupted(). > > For parity with task_stack_end_corrupted(), rename scs_corrupted() to > task_scs_end_corrupted() and call it from schedule_debug() when > CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused > scs_overflow_check() function entirely. > > This has absolutely no impact on architectures that do not support SCS > (currently arm64 only). > > Signed-off-by: Will Deacon <will@kernel.org> Pulling this out of arch code seems sane to me, and the arch-specific chanes look sound. However, I have a concern with the changes within the scheduler context-switch. > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index a35d3318492c..56be4cbf771f 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -52,7 +52,6 @@ > #include <asm/mmu_context.h> > #include <asm/processor.h> > #include <asm/pointer_auth.h> > -#include <asm/scs.h> > #include <asm/stacktrace.h> > > #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) > @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev, > entry_task_switch(next); > uao_thread_switch(next); > ssbs_thread_switch(next); > - scs_overflow_check(next); Prior to this patch, we'd never switch to a task whose SCS had already been corrupted. With this patch, we only check that when switching away from a task, and only when CONFIG_SCHED_STACK_END_CHECK is selected, which at first glance seems to weaken that. Arguably: * If the next task's SCS was corrupted by that task while it was running, we had already lost at that point. * If the next task's SCS was corrupted by another task, then that could also happen immediately after the check (though timing to avoid the check but affect the process could be harder). ... and a VMAP'd SCS would be much nicer in this regard. Do we think this is weakening the check, or do we think it wasn't all that helpful to begin with? Mark. > > /* > * Complete any pending TLB or cache maintenance on this CPU in case > diff --git a/arch/arm64/kernel/scs.c b/arch/arm64/kernel/scs.c > index adc97f826fab..955875dff9e1 100644 > --- a/arch/arm64/kernel/scs.c > +++ b/arch/arm64/kernel/scs.c > @@ -6,7 +6,7 @@ > */ > > #include <linux/percpu.h> > -#include <asm/scs.h> > +#include <linux/scs.h> > > /* Allocate a static per-CPU shadow stack */ > #define DEFINE_SCS(name) \ > diff --git a/include/linux/scs.h b/include/linux/scs.h > index 0eb2485ef832..2fd3df50e93e 100644 > --- a/include/linux/scs.h > +++ b/include/linux/scs.h > @@ -47,7 +47,7 @@ static inline unsigned long *__scs_magic(void *s) > return (unsigned long *)(s + SCS_SIZE) - 1; > } > > -static inline bool scs_corrupted(struct task_struct *tsk) > +static inline bool task_scs_end_corrupted(struct task_struct *tsk) > { > unsigned long *magic = __scs_magic(task_scs(tsk)); > unsigned long sz = task_scs_sp(tsk) - task_scs(tsk); > @@ -60,8 +60,8 @@ static inline bool scs_corrupted(struct task_struct *tsk) > static inline void scs_init(void) {} > static inline void scs_task_reset(struct task_struct *tsk) {} > static inline int scs_prepare(struct task_struct *tsk, int node) { return 0; } > -static inline bool scs_corrupted(struct task_struct *tsk) { return false; } > static inline void scs_release(struct task_struct *tsk) {} > +static inline bool task_scs_end_corrupted(struct task_struct *tsk) { return false; } > > #endif /* CONFIG_SHADOW_CALL_STACK */ > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 934e03cfaec7..a1d815a11b90 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3878,6 +3878,9 @@ static inline void schedule_debug(struct task_struct *prev, bool preempt) > #ifdef CONFIG_SCHED_STACK_END_CHECK > if (task_stack_end_corrupted(prev)) > panic("corrupted stack end detected inside scheduler\n"); > + > + if (task_scs_end_corrupted(prev)) > + panic("corrupted shadow stack detected inside scheduler\n"); > #endif > > #ifdef CONFIG_DEBUG_ATOMIC_SLEEP > diff --git a/kernel/scs.c b/kernel/scs.c > index aea841cd7586..faf0ecd7b893 100644 > --- a/kernel/scs.c > +++ b/kernel/scs.c > @@ -98,7 +98,8 @@ void scs_release(struct task_struct *tsk) > if (!s) > return; > > - WARN(scs_corrupted(tsk), "corrupted shadow stack detected when freeing task\n"); > + WARN(task_scs_end_corrupted(tsk), > + "corrupted shadow stack detected when freeing task\n"); > scs_check_usage(tsk); > scs_free(s); > } > -- > 2.26.2.761.g0e0b3e54be-goog >
On Mon, May 18, 2020 at 01:12:10PM +0100, Mark Rutland wrote: > On Fri, May 15, 2020 at 06:27:54PM +0100, Will Deacon wrote: > > There is nothing architecture-specific about scs_overflow_check() as > > it's just a trivial wrapper around scs_corrupted(). > > > > For parity with task_stack_end_corrupted(), rename scs_corrupted() to > > task_scs_end_corrupted() and call it from schedule_debug() when > > CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused > > scs_overflow_check() function entirely. > > > > This has absolutely no impact on architectures that do not support SCS > > (currently arm64 only). > > > > Signed-off-by: Will Deacon <will@kernel.org> > > Pulling this out of arch code seems sane to me, and the arch-specific > chanes look sound. However, I have a concern with the changes within the > scheduler context-switch. > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > > index a35d3318492c..56be4cbf771f 100644 > > --- a/arch/arm64/kernel/process.c > > +++ b/arch/arm64/kernel/process.c > > @@ -52,7 +52,6 @@ > > #include <asm/mmu_context.h> > > #include <asm/processor.h> > > #include <asm/pointer_auth.h> > > -#include <asm/scs.h> > > #include <asm/stacktrace.h> > > > > #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) > > @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev, > > entry_task_switch(next); > > uao_thread_switch(next); > > ssbs_thread_switch(next); > > - scs_overflow_check(next); > > Prior to this patch, we'd never switch to a task whose SCS had already > been corrupted. > > With this patch, we only check that when switching away from a task, and > only when CONFIG_SCHED_STACK_END_CHECK is selected, which at first > glance seems to weaken that. Yes, ignoring vmap'd stacks, this patch brings the SCS checking in-line with the main stack checking when CONFIG_SCHED_STACK_END_CHECK=y. > Arguably: > > * If the next task's SCS was corrupted by that task while it was > running, we had already lost at that point. With this change, we'll at least catch this one sooner, and that might be useful if a bug has caused us to overflow the SCS but not the main stack. > * If the next task's SCS was corrupted by another task, then that could > also happen immediately after the check (though timing to avoid the > check but affect the process could be harder). We're only checking the magic end value, so the cross-task case is basically if you overrun your own SCS as above, but then continue to overrun entire SCSs for other tasks as well. It's probably not very useful in that case. > ... and a VMAP'd SCS would be much nicer in this regard. > > Do we think this is weakening the check, or do we think it wasn't all > that helpful to begin with? I see it as a debug check to catch SCS overflow, rather than a hardening feature, and I agree that using something like vmap stack for the SCS would be better because we could have a guard page instead. This is something I would like to revisit, but we need more information from Sami about why Android rejected the larger allocation size, since I don't think there's an awful lot of point merging this series if Android doesn't pick it up. Will
On Mon, May 18, 2020 at 02:23:47PM +0100, Will Deacon wrote: > On Mon, May 18, 2020 at 01:12:10PM +0100, Mark Rutland wrote: > > On Fri, May 15, 2020 at 06:27:54PM +0100, Will Deacon wrote: > > > There is nothing architecture-specific about scs_overflow_check() as > > > it's just a trivial wrapper around scs_corrupted(). > > > > > > For parity with task_stack_end_corrupted(), rename scs_corrupted() to > > > task_scs_end_corrupted() and call it from schedule_debug() when > > > CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused > > > scs_overflow_check() function entirely. > > > > > > This has absolutely no impact on architectures that do not support SCS > > > (currently arm64 only). > > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > Pulling this out of arch code seems sane to me, and the arch-specific > > chanes look sound. However, I have a concern with the changes within the > > scheduler context-switch. > > > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > > > index a35d3318492c..56be4cbf771f 100644 > > > --- a/arch/arm64/kernel/process.c > > > +++ b/arch/arm64/kernel/process.c > > > @@ -52,7 +52,6 @@ > > > #include <asm/mmu_context.h> > > > #include <asm/processor.h> > > > #include <asm/pointer_auth.h> > > > -#include <asm/scs.h> > > > #include <asm/stacktrace.h> > > > > > > #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) > > > @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev, > > > entry_task_switch(next); > > > uao_thread_switch(next); > > > ssbs_thread_switch(next); > > > - scs_overflow_check(next); > > > > Prior to this patch, we'd never switch to a task whose SCS had already > > been corrupted. > > > > With this patch, we only check that when switching away from a task, and > > only when CONFIG_SCHED_STACK_END_CHECK is selected, which at first > > glance seems to weaken that. > > Yes, ignoring vmap'd stacks, this patch brings the SCS checking in-line with > the main stack checking when CONFIG_SCHED_STACK_END_CHECK=y. > > > Arguably: > > > > * If the next task's SCS was corrupted by that task while it was > > running, we had already lost at that point. > > With this change, we'll at least catch this one sooner, and that might be > useful if a bug has caused us to overflow the SCS but not the main stack. Sure, but only if CONFIG_SCHED_STACK_END_CHECK is selected. > > * If the next task's SCS was corrupted by another task, then that could > > also happen immediately after the check (though timing to avoid the > > check but affect the process could be harder). > > We're only checking the magic end value, so the cross-task case is basically > if you overrun your own SCS as above, but then continue to overrun entire > SCSs for other tasks as well. It's probably not very useful in that case. > > > ... and a VMAP'd SCS would be much nicer in this regard. > > > > Do we think this is weakening the check, or do we think it wasn't all > > that helpful to begin with? > > I see it as a debug check to catch SCS overflow, rather than a hardening > feature, and I agree that using something like vmap stack for the SCS would > be better because we could have a guard page instead. Fair enough. Could we put something into the commit message that more explicitly calls out debug-not-hardening? I agree that under that model this patch looks fine, and with something to that effect: Reviewed-by: Mark Rutland <mark.rutland@arm.com> Mark. > This is something I would like to revisit, but we need more > information from Sami about why Android rejected the larger allocation > size, since I don't think there's an awful lot of point merging this > series if Android doesn't pick it up. Indeed. I'd certainly prefer the robustness of a VMAP'd SCS if we can do that. Mark.
On Mon, May 18, 2020 at 02:32:31PM +0100, Mark Rutland wrote: > On Mon, May 18, 2020 at 02:23:47PM +0100, Will Deacon wrote: > > On Mon, May 18, 2020 at 01:12:10PM +0100, Mark Rutland wrote: > > > On Fri, May 15, 2020 at 06:27:54PM +0100, Will Deacon wrote: > > > > There is nothing architecture-specific about scs_overflow_check() as > > > > it's just a trivial wrapper around scs_corrupted(). > > > > > > > > For parity with task_stack_end_corrupted(), rename scs_corrupted() to > > > > task_scs_end_corrupted() and call it from schedule_debug() when > > > > CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused > > > > scs_overflow_check() function entirely. > > > > > > > > This has absolutely no impact on architectures that do not support SCS > > > > (currently arm64 only). > > > > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > > > Pulling this out of arch code seems sane to me, and the arch-specific > > > chanes look sound. However, I have a concern with the changes within the > > > scheduler context-switch. > > > > > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > > > > index a35d3318492c..56be4cbf771f 100644 > > > > --- a/arch/arm64/kernel/process.c > > > > +++ b/arch/arm64/kernel/process.c > > > > @@ -52,7 +52,6 @@ > > > > #include <asm/mmu_context.h> > > > > #include <asm/processor.h> > > > > #include <asm/pointer_auth.h> > > > > -#include <asm/scs.h> > > > > #include <asm/stacktrace.h> > > > > > > > > #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) > > > > @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev, > > > > entry_task_switch(next); > > > > uao_thread_switch(next); > > > > ssbs_thread_switch(next); > > > > - scs_overflow_check(next); > > > > > > Prior to this patch, we'd never switch to a task whose SCS had already > > > been corrupted. > > > > > > With this patch, we only check that when switching away from a task, and > > > only when CONFIG_SCHED_STACK_END_CHECK is selected, which at first > > > glance seems to weaken that. > > > > Yes, ignoring vmap'd stacks, this patch brings the SCS checking in-line with > > the main stack checking when CONFIG_SCHED_STACK_END_CHECK=y. > > > > > Arguably: > > > > > > * If the next task's SCS was corrupted by that task while it was > > > running, we had already lost at that point. > > > > With this change, we'll at least catch this one sooner, and that might be > > useful if a bug has caused us to overflow the SCS but not the main stack. > > Sure, but only if CONFIG_SCHED_STACK_END_CHECK is selected. > > > > * If the next task's SCS was corrupted by another task, then that could > > > also happen immediately after the check (though timing to avoid the > > > check but affect the process could be harder). > > > > We're only checking the magic end value, so the cross-task case is basically > > if you overrun your own SCS as above, but then continue to overrun entire > > SCSs for other tasks as well. It's probably not very useful in that case. > > > > > ... and a VMAP'd SCS would be much nicer in this regard. > > > > > > Do we think this is weakening the check, or do we think it wasn't all > > > that helpful to begin with? > > > > I see it as a debug check to catch SCS overflow, rather than a hardening > > feature, and I agree that using something like vmap stack for the SCS would > > be better because we could have a guard page instead. > > Fair enough. Could we put something into the commit message that more > explicitly calls out debug-not-hardening? I agree that under that model > this patch looks fine, and with something to that effect: > > Reviewed-by: Mark Rutland <mark.rutland@arm.com> > > Mark. > > > This is something I would like to revisit, but we need more > > information from Sami about why Android rejected the larger allocation > > size, since I don't think there's an awful lot of point merging this > > series if Android doesn't pick it up. > > Indeed. I'd certainly prefer the robustness of a VMAP'd SCS if we can do > that. For smaller devices, the memory overhead was too high. (i.e. 4x more memory allocated to kernel stacks -- 4k vs 1k per thread.) The series is much more than just a stack exhaustion defense, so I don't think that detail needs to block the entire series. FWIW, I'd like to have both modes (contiguous and vmap) available so that system builders can choose their trade-off. Both will gain return address corruption defense, but the vmap case will protect against neighboring SCS corruption in the face of very-unlikely-but-technically-possible stack exhaustion (remember that with the elimination of VLAs, the stack depth compile time checking, and the regular stack VMAP guard page, it will be quite difficult to exhaust the SCS -- either because there is no code path to accomplish it, or because it would trip the regular stack guard page first).
On Mon, May 18, 2020 at 08:31:49AM -0700, Kees Cook wrote: > On Mon, May 18, 2020 at 02:32:31PM +0100, Mark Rutland wrote: > > On Mon, May 18, 2020 at 02:23:47PM +0100, Will Deacon wrote: > > > This is something I would like to revisit, but we need more > > > information from Sami about why Android rejected the larger allocation > > > size, since I don't think there's an awful lot of point merging this > > > series if Android doesn't pick it up. > > > > Indeed. I'd certainly prefer the robustness of a VMAP'd SCS if we can do > > that. > > For smaller devices, the memory overhead was too high. (i.e. 4x more > memory allocated to kernel stacks -- 4k vs 1k per thread.) I just don't see an extra 3k per thread as being a real issue (the main stack is 16k already). Even just the CPU register state is around 1k. But I'd be very keen to see numbers/performance data that proves me wrong. Will
diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h index d46efdd2060a..eaa2cd92e4c1 100644 --- a/arch/arm64/include/asm/scs.h +++ b/arch/arm64/include/asm/scs.h @@ -24,24 +24,6 @@ .endm #endif /* CONFIG_SHADOW_CALL_STACK */ -#else /* __ASSEMBLY__ */ - -#include <linux/scs.h> - -#ifdef CONFIG_SHADOW_CALL_STACK - -static inline void scs_overflow_check(struct task_struct *tsk) -{ - if (unlikely(scs_corrupted(tsk))) - panic("corrupted shadow stack detected inside scheduler\n"); -} - -#else /* CONFIG_SHADOW_CALL_STACK */ - -static inline void scs_overflow_check(struct task_struct *tsk) {} - -#endif /* CONFIG_SHADOW_CALL_STACK */ - #endif /* __ASSEMBLY __ */ #endif /* _ASM_SCS_H */ diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index a35d3318492c..56be4cbf771f 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -52,7 +52,6 @@ #include <asm/mmu_context.h> #include <asm/processor.h> #include <asm/pointer_auth.h> -#include <asm/scs.h> #include <asm/stacktrace.h> #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev, entry_task_switch(next); uao_thread_switch(next); ssbs_thread_switch(next); - scs_overflow_check(next); /* * Complete any pending TLB or cache maintenance on this CPU in case diff --git a/arch/arm64/kernel/scs.c b/arch/arm64/kernel/scs.c index adc97f826fab..955875dff9e1 100644 --- a/arch/arm64/kernel/scs.c +++ b/arch/arm64/kernel/scs.c @@ -6,7 +6,7 @@ */ #include <linux/percpu.h> -#include <asm/scs.h> +#include <linux/scs.h> /* Allocate a static per-CPU shadow stack */ #define DEFINE_SCS(name) \ diff --git a/include/linux/scs.h b/include/linux/scs.h index 0eb2485ef832..2fd3df50e93e 100644 --- a/include/linux/scs.h +++ b/include/linux/scs.h @@ -47,7 +47,7 @@ static inline unsigned long *__scs_magic(void *s) return (unsigned long *)(s + SCS_SIZE) - 1; } -static inline bool scs_corrupted(struct task_struct *tsk) +static inline bool task_scs_end_corrupted(struct task_struct *tsk) { unsigned long *magic = __scs_magic(task_scs(tsk)); unsigned long sz = task_scs_sp(tsk) - task_scs(tsk); @@ -60,8 +60,8 @@ static inline bool scs_corrupted(struct task_struct *tsk) static inline void scs_init(void) {} static inline void scs_task_reset(struct task_struct *tsk) {} static inline int scs_prepare(struct task_struct *tsk, int node) { return 0; } -static inline bool scs_corrupted(struct task_struct *tsk) { return false; } static inline void scs_release(struct task_struct *tsk) {} +static inline bool task_scs_end_corrupted(struct task_struct *tsk) { return false; } #endif /* CONFIG_SHADOW_CALL_STACK */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 934e03cfaec7..a1d815a11b90 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3878,6 +3878,9 @@ static inline void schedule_debug(struct task_struct *prev, bool preempt) #ifdef CONFIG_SCHED_STACK_END_CHECK if (task_stack_end_corrupted(prev)) panic("corrupted stack end detected inside scheduler\n"); + + if (task_scs_end_corrupted(prev)) + panic("corrupted shadow stack detected inside scheduler\n"); #endif #ifdef CONFIG_DEBUG_ATOMIC_SLEEP diff --git a/kernel/scs.c b/kernel/scs.c index aea841cd7586..faf0ecd7b893 100644 --- a/kernel/scs.c +++ b/kernel/scs.c @@ -98,7 +98,8 @@ void scs_release(struct task_struct *tsk) if (!s) return; - WARN(scs_corrupted(tsk), "corrupted shadow stack detected when freeing task\n"); + WARN(task_scs_end_corrupted(tsk), + "corrupted shadow stack detected when freeing task\n"); scs_check_usage(tsk); scs_free(s); }
There is nothing architecture-specific about scs_overflow_check() as it's just a trivial wrapper around scs_corrupted(). For parity with task_stack_end_corrupted(), rename scs_corrupted() to task_scs_end_corrupted() and call it from schedule_debug() when CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused scs_overflow_check() function entirely. This has absolutely no impact on architectures that do not support SCS (currently arm64 only). Signed-off-by: Will Deacon <will@kernel.org> --- arch/arm64/include/asm/scs.h | 18 ------------------ arch/arm64/kernel/process.c | 2 -- arch/arm64/kernel/scs.c | 2 +- include/linux/scs.h | 4 ++-- kernel/sched/core.c | 3 +++ kernel/scs.c | 3 ++- 6 files changed, 8 insertions(+), 24 deletions(-)