[v4,3/5] stack: Optionally randomize kernel stack offset each syscall

From: Kees Cook

This provides the ability for architectures to enable kernel stack base
address offset randomization. This feature is controlled by the boot
param "randomize_kstack_offset=on/off", with its default value set by
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.

This feature is based on the original idea from the last public release
of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
All the credit for the original idea goes to the PaX team. Note that
the design and implementation of this upstream randomize_kstack_offset
feature differs greatly from the RANDKSTACK feature (see below).

Reasoning for the feature:

This feature aims to make harder the various stack-based attacks that
rely on deterministic stack structure. We have had many such attacks in
past (just to name few):

https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html

As Linux kernel stack protections have been constantly improving
(vmap-based stack allocation with guard pages, removal of thread_info,
STACKLEAK), attackers have had to find new ways for their exploits
to work. They have done so, continuing to rely on the kernel's stack
determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
were not relevant. For example, the following recent attacks would have
been hampered if the stack offset was non-deterministic between syscalls:

https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
(page 70: targeting the pt_regs copy with linear stack overflow)

https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
(leaked stack address from one syscall as a target during next syscall)

The main idea is that since the stack offset is randomized on each system
call, it is harder for an attack to reliably land in any particular place
on the thread stack, even with address exposures, as the stack base will
change on the next syscall. Also, since randomization is performed after
placing pt_regs, the ptrace-based approach[1] to discover the randomized
offset during a long-running syscall should not be possible.

Design description:

During most of the kernel's execution, it runs on the "thread stack",
which is pretty deterministic in its structure: it is fixed in size,
and on every entry from userspace to kernel on a syscall the thread
stack starts construction from an address fetched from the per-cpu
cpu_current_top_of_stack variable. The first element to be pushed to the
thread stack is the pt_regs struct that stores all required CPU registers
and syscall parameters. Finally the specific syscall function is called,
with the stack being used as the kernel executes the resulting request.

The goal of randomize_kstack_offset feature is to add a random offset
after the pt_regs has been pushed to the stack and before the rest of the
thread stack is used during the syscall processing, and to change it every
time a process issues a syscall. The source of randomness is currently
architecture-defined (but x86 is using the low byte of rdtsc()). Future
improvements for different entropy sources is possible, but out of scope
for this patch. As suggested by Andy Lutomirski, the offset is added
using alloca() and an empty asm() statement with an output constraint,
since it avoid changes to assembly syscall entry code, to the unwinder,
and provides correct stack alignment as defined by the compiler.

In order to make this available by default with zero performance impact
for those that don't want it, it is boot-time selectable with static
branches. This way, if the overhead is not wanted, it can just be
left turned off with no performance impact.

The generated assembly for x86_64 with GCC looks like this:

...
ffffffff81003977: 65 8b 05 02 ea 00 7f  mov %gs:0x7f00ea02(%rip),%eax
					    # 12380 <kstack_offset>
ffffffff8100397e: 25 ff 03 00 00        and $0x3ff,%eax
ffffffff81003983: 48 83 c0 0f           add $0xf,%rax
ffffffff81003987: 25 f8 07 00 00        and $0x7f8,%eax
ffffffff8100398c: 48 29 c4              sub %rax,%rsp
ffffffff8100398f: 48 8d 44 24 0f        lea 0xf(%rsp),%rax
ffffffff81003994: 48 83 e0 f0           and $0xfffffffffffffff0,%rax
...

As a result of the above stack alignment, this patch introduces about
5 bits of randomness after pt_regs is spilled to the thread stack on
x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for
stack alignment). The amount of entropy could be adjusted based on how
much of the stack space we wish to trade for security.

My measure of syscall performance overhead (on x86_64):

lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
    randomize_kstack_offset=y	Simple syscall: 0.7082 microseconds
    randomize_kstack_offset=n	Simple syscall: 0.7016 microseconds

So, roughly 0.9% overhead growth for a no-op syscall, which is very
manageable. And for people that don't want this, it's off by default.

There are two gotchas with using the alloca() trick. First,
compilers that have Stack Clash protection (-fstack-clash-protection)
enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to
any dynamic stack allocations. While the randomization offset is
always less than a page, the resulting assembly would still contain
(unreachable!) probing routines, bloating the resulting assembly. To
avoid this, -fno-stack-clash-protection is unconditionally added to
the kernel Makefile since this is the only dynamic stack allocation in
the kernel (now that VLAs have been removed) and it is provably safe
from Stack Clash style attacks.

The second gotcha with alloca() is a negative interaction with
-fstack-protector-strong, in that it sees the alloca() as an array
allocation, which triggers the unconditional addition of the stack
canary function pre/post-amble which slows down syscalls regardless of
the static branch. In order to avoid adding this unneeded check and its
associated performance impact, architectures need to downgrade uses of
-fstack-protector-strong to -fstack-protector (which only triggers for
char arrays) in the compilation units that use the add_random_kstack()
macro and to audit the resulting stack mitigation coverage (to make sure
no desired coverage disappears). This change is not needed on x86
because stack protector is already unconditionally disabled for the
compilation unite, but is needed on arm64. There is, unfortunately,
no attribute that can be used to disable stack protector for specific
functions.

Comparison to PaX RANDKSTACK feature:

The RANDKSTACK feature randomizes the location of the stack start
(cpu_current_top_of_stack), i.e. including the location of pt_regs
structure itself on the stack. Initially this patch followed the same
approach, but during the recent discussions[2], it has been determined
to be of a little value since, if ptrace functionality is available for
an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
different offsets in the pt_regs struct, observe the cache behavior of
the pt_regs accesses, and figure out the random stack offset. Another
difference is that the random offset is stored in a per-cpu variable,
rather than having it be per-thread. As a result, these implementations
differ a fair bit in their implementation details and results, though
obviously the intent is similar.

[1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
[2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
[3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.html

Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Link: https://lore.kernel.org/r/20190415060918.3766-1-elena.reshetova@intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Makefile                         |  4 ++++
 arch/Kconfig                     | 23 ++++++++++++++++++
 include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++
 init/main.c                      | 23 ++++++++++++++++++
 4 files changed, 90 insertions(+)
 create mode 100644 include/linux/randomize_kstack.h

Message ID	20200622193146.2985288-4-keescook@chromium.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=/KMa=AD=lists.openwall.com=kernel-hardening-return-19037-patchwork-kernel-hardening=patchwork.kernel.org@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 62741912 for <patchwork-kernel-hardening@patchwork.kernel.org>; Mon, 22 Jun 2020 19:32:41 +0000 (UTC) Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.kernel.org (Postfix) with SMTP id 6AFD7207DD for <patchwork-kernel-hardening@patchwork.kernel.org>; Mon, 22 Jun 2020 19:32:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="ajsr+H7m" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6AFD7207DD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chromium.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernel-hardening-return-19037-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com Received: (qmail 19904 invoked by uid 550); 22 Jun 2020 19:32:08 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: <mailto:kernel-hardening@lists.openwall.com> List-Help: <mailto:kernel-hardening-help@lists.openwall.com> List-Unsubscribe: <mailto:kernel-hardening-unsubscribe@lists.openwall.com> List-Subscribe: <mailto:kernel-hardening-subscribe@lists.openwall.com> List-ID: <kernel-hardening.lists.openwall.com> Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 19661 invoked from network); 22 Jun 2020 19:32:06 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ETibkmg3LTh2K86BfakY+sfkjiZxxEduxUu4TP2wXYU=; b=ajsr+H7mMTpamQsCtpYBipO36zKVs/1VLjBK7TBKRWLKNdVO8ZKynduIYcvzWHc9js Z35TrOXmUYoN58DRLVwW/W5I2jmNcalB8MrJEag4JPhug2KIS8ZCMZFlbSnTnngiJLG0 SS6ZVaN1lxd1tUDPgKdG5ys2IDPGx9XGZrzn4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ETibkmg3LTh2K86BfakY+sfkjiZxxEduxUu4TP2wXYU=; b=gxkbAj28wWCI90br7MK5Q5IAmbHAD0nrOrcBRzqNv594+ZXbcN6Y11QOx+E4njR/1V o35/11TKHJJIzg0tuom2G96rn/ekNlRrAW5Ijz4YUXgIIdCUmGf9hsbqYBA6aPkKtEG1 1pWtpWzII4faVkV3HbFC5tln++P4i65QsKh183NkOKrWoC2eqDJZwydCbcVHoDumEmln JUjtSe7262efxQXLY1jW1ky2QjOHzbQhgSBod56mzHxcSMLk+Lo8ibmkGOXmZlvsLrQ4 2gqrpz0iAWRCuuBJWyikwAP3bsjSSLl3QK+4ZN9g9r/NjZIj0KwT6y3V7oLr09v5efha vJ2A== X-Gm-Message-State: AOAM533UEKvZxOW5HFT0Nc47cMFyWEJPfVg37T5J5qi3s68lAHr9739e 0GfM8ZMdWRBuBIbj7zNWxTTVCA== X-Google-Smtp-Source: ABdhPJy6fJbiU1P7tQzm3n6EQcYtlh6IRV4Fjc3C1YFRCQw+vvJVEo6c02AaYhufYUaHTMrSi0vv4A== X-Received: by 2002:a17:90a:fa95:: with SMTP id cu21mr19155151pjb.56.1592854313991; Mon, 22 Jun 2020 12:31:53 -0700 (PDT) From: Kees Cook <keescook@chromium.org> To: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org>, Elena Reshetova <elena.reshetova@intel.com>, x86@kernel.org, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Potapenko <glider@google.com>, Alexander Popov <alex.popov@linux.com>, Ard Biesheuvel <ard.biesheuvel@linaro.org>, Jann Horn <jannh@google.com>, kernel-hardening@lists.openwall.com, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall Date: Mon, 22 Jun 2020 12:31:44 -0700 Message-Id: <20200622193146.2985288-4-keescook@chromium.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200622193146.2985288-1-keescook@chromium.org> References: <20200622193146.2985288-1-keescook@chromium.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Optionally randomize kernel stack offset each syscall \| expand [v4,0/5] Optionally randomize kernel stack offset each syscall [v4,1/5] jump_label: Provide CONFIG-driven build state defaults [v4,2/5] init_on_alloc: Unpessimize default-on builds [v4,3/5] stack: Optionally randomize kernel stack offset each syscall [v4,4/5] x86/entry: Enable random_kstack_offset support [v4,5/5] arm64: entry: Enable random_kstack_offset support

[v4,3/5] stack: Optionally randomize kernel stack offset each syscall

Commit Message

Comments

Patch