From patchwork Thu Dec 9 22:15:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Collingbourne X-Patchwork-Id: 12695543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B37AEC433F5 for ; Thu, 9 Dec 2021 22:22:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=bfSTcTA3Q0izeT7Kl8PakYkyadecqVuaHtMnc0DA2CQ=; b=cj/IaZFgVpTxuF8tcP2ORe8hP6 AzpBaJkU8th1JLvYB3XT080JjDgdbuQjBtdbVvNJUySVHFlSl5o8wPMcbsjRButyYpRDYcF+dfOLT IQnRiA8+xmsdG07M+4GqRUhePBgz/9O2MBdeFVs2SdsM0J6UzxxYtwQJq5UcpuENCvA02GCMIEirW KO0+LlfD7skCflEzNmYg2WRHW2w1+h75nQ3qxnUbRDhv9UlS7FPWBb4IInV8hRi1o084EWD8iP/aw ktlCE+5rZQ0LHQwJ4C+Q9syt2UiNOVuZtkLkjN8JIOH6xSR1rKzC0u0+g2DlOHHquPAaI/X5akUgb ctRZOkCw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mvRlw-000DsB-4S; Thu, 09 Dec 2021 22:20:17 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mvRhw-000CIV-Eb for linux-arm-kernel@lists.infradead.org; Thu, 09 Dec 2021 22:16:10 +0000 Received: by mail-yb1-xb49.google.com with SMTP id t184-20020a2546c1000000b006008b13c80bso13101895yba.1 for ; Thu, 09 Dec 2021 14:16:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=a1e01/uQIMruIVMxVHsJI+lKzq1arm+BxjGSeQVMHMA=; b=PqB8Y03IRHWlUaGuR+3mpNmM+NV36jEBvM5WalY9UQa7LJQ3Ha7mLeU0SrXzuGpgv7 3f1FJ7nyOU1HKi0FN2fKfIxI5t5swTAFYDfuuNdYZhQqduoKHQ1cYMi4OIcVm53c3vCy 7yT5rL4j/PWClDNNlKwcRZOM73uddO3lpdJphfRR21b42NwPq21ghJtd4I9iZuBlLngU qCZ6m7MdzEt3RAGpy7Ix+uOw5VaFfqKUEZwR8gWh/+ky7IT5ELKb9UeUJd9IKkCKn5xq NkDxW8QZ9WvEJwcrHkgaoPFQ7+5GNQoVTcHihopQaPQAfuvCjJ80sufoumyAG3oTmcrm rdug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=a1e01/uQIMruIVMxVHsJI+lKzq1arm+BxjGSeQVMHMA=; b=fcTXKA6trgE/oC0xfJeSYawZUCS41GIB06jXLbUKKw4X7GSLNhVSjTxUf3jeMRE/5f M2UQfg8YSo8YLTmSj7kBFbCYuRFWoD1VNV+Qtoc3ixNLm/TFZTADnFa4tQMF6WByr/LN bLIMZToKCDcQTBBDyOR5whfN7/3fOlZHYPKW+cRv9/m6z2drATPUaQyt42y0aJjcMIXi OkbBySFejBHUDeqGDKAZmkY5q1NtuSxQRz1sFHG2QT9hvYWT6i/dCMCakgLTEO9PsYNY rrqmUblgQNx0HsINjtxrhNVv9KWfhxHZi85+E4FlsP3sigb8pJnLQjs1HulM9BFxXUV6 Qf0A== X-Gm-Message-State: AOAM532C/nC8wR+mFzpeD4HLjOBpjY2LBfHE9o0mFeeKYtjTRnDqedL1 fs/Re0mFzU8vS0TZRWYDta2aKSw= X-Google-Smtp-Source: ABdhPJxc0LQEcS2GZ0xtTl0rfiaEYOwwtlAW2WphlbELpXxEcghi8DcFKiFyg+sJxtCD3pCjqLRkEqw= X-Received: from pcc-desktop.svl.corp.google.com ([2620:15c:2ce:200:f233:e324:8aa0:f65c]) (user=pcc job=sendgmr) by 2002:a25:2e49:: with SMTP id b9mr10139518ybn.414.1639088166927; Thu, 09 Dec 2021 14:16:06 -0800 (PST) Date: Thu, 9 Dec 2021 14:15:43 -0800 In-Reply-To: <20211209221545.2333249-1-pcc@google.com> Message-Id: <20211209221545.2333249-7-pcc@google.com> Mime-Version: 1.0 References: <20211209221545.2333249-1-pcc@google.com> X-Mailer: git-send-email 2.34.1.173.g76aa8bc2d0-goog Subject: [PATCH v4 6/7] Documentation: document uaccess logging From: Peter Collingbourne To: Catalin Marinas , Will Deacon , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Thomas Gleixner , Andy Lutomirski , Kees Cook , Andrew Morton , Masahiro Yamada , Sami Tolvanen , YiFei Zhu , Mark Rutland , Frederic Weisbecker , Viresh Kumar , Andrey Konovalov , Peter Collingbourne , Gabriel Krisman Bertazi , Chris Hyser , Daniel Vetter , Chris Wilson , Arnd Bergmann , Dmitry Vyukov , Christian Brauner , "Eric W. Biederman" , Alexey Gladkov , Ran Xiaokai , David Hildenbrand , Xiaofeng Cao , Cyrill Gorcunov , Thomas Cedeno , Marco Elver , Alexander Potapenko Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Evgenii Stepanov X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211209_141608_570985_5D832564 X-CRM114-Status: GOOD ( 31.35 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add documentation for the uaccess logging feature. Link: https://linux-review.googlesource.com/id/Ia626c0ca91bc0a3d8067d7f28406aa40693b65a2 Signed-off-by: Peter Collingbourne --- v3: - document what happens if passing NULL to prctl - be explicit about meaning of addr and size Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/uaccess-logging.rst | 151 ++++++++++++++++++ 2 files changed, 152 insertions(+) create mode 100644 Documentation/admin-guide/uaccess-logging.rst diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 1bedab498104..4f6ee447ab2f 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -54,6 +54,7 @@ ABI will be found here. :maxdepth: 1 sysfs-rules + uaccess-logging The rest of this manual consists of various unordered guides on how to configure specific aspects of kernel behavior to your liking. diff --git a/Documentation/admin-guide/uaccess-logging.rst b/Documentation/admin-guide/uaccess-logging.rst new file mode 100644 index 000000000000..24def38bbdf8 --- /dev/null +++ b/Documentation/admin-guide/uaccess-logging.rst @@ -0,0 +1,151 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +Uaccess Logging +=============== + +Background +---------- + +Userspace tools such as sanitizers (ASan, MSan, HWASan) and tools +making use of the ARM Memory Tagging Extension (MTE) need to +monitor all memory accesses in a program so that they can detect +memory errors. Furthermore, fuzzing tools such as syzkaller need to +monitor all memory accesses so that they know which parts of memory +to fuzz. For accesses made purely in userspace, this is achieved +via compiler instrumentation, or for MTE, via direct hardware +support. However, accesses made by the kernel on behalf of the user +program via syscalls (i.e. uaccesses) are normally invisible to +these tools. + +Traditionally, the sanitizers have handled this by interposing the libc +syscall stubs with a wrapper that checks the memory based on what we +believe the uaccesses will be. However, this creates a maintenance +burden: each syscall must be annotated with its uaccesses in order +to be recognized by the sanitizer, and these annotations must be +continuously updated as the kernel changes. + +The kernel's uaccess logging feature provides userspace tools with +the address and size of each userspace access, thereby allowing these +tools to report memory errors involving these accesses without needing +annotations for every syscall. + +By relying on the kernel's actual uaccesses, rather than a +reimplementation of them, the userspace memory safety tools may +play a dual role of verifying the validity of kernel accesses. Even +a sanitizer whose syscall wrappers have complete knowledge of the +kernel's intended API may vary from the kernel's actual uaccesses due +to kernel bugs. A sanitizer with knowledge of the kernel's actual +uaccesses may produce more accurate error reports that reveal such +bugs. For example, a kernel that accesses more memory than expected +by the userspace program could indicate that either userspace or the +kernel has the wrong idea about which kernel functionality is being +requested -- either way, there is a bug. + +Interface +--------- + +The feature may be used via the following prctl: + +.. code-block:: c + + uint64_t addr = 0; /* Generally will be a TLS slot or equivalent */ + prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR, &addr, 0, 0, 0); + +Supplying a non-zero address as the second argument to ``prctl`` +will cause the kernel to read an address (referred to as the *uaccess +descriptor address*) from that address on each kernel entry. Specifying +an address of NULL as the second argument will restore the kernel's +default behavior, i.e. no uaccess descriptor address is read. + +When entering the kernel with a non-zero uaccess descriptor address +to handle a syscall, the kernel will read a data structure of type +``struct uaccess_descriptor`` from the uaccess descriptor address, +which is defined as follows: + +.. code-block:: c + + struct uaccess_descriptor { + uint64_t addr, size; + }; + +This data structure contains the address and size (in array elements) +of a *uaccess buffer*, which is an array of data structures of type +``struct uaccess_buffer_entry``. Before returning to userspace, the +kernel will log information about uaccesses to sequential entries +in the uaccess buffer. It will also store ``NULL`` to the uaccess +descriptor address, and store the address and size of the unused +portion of the uaccess buffer to the uaccess descriptor. + +The format of a uaccess buffer entry is defined as follows: + +.. code-block:: c + + struct uaccess_buffer_entry { + uint64_t addr, size, flags; + }; + +``addr`` and ``size`` contain the address and size of the user memory +access. On arm64, tag bits are preserved in the ``addr`` field. There +is currently one flag bit assignment for the ``flags`` field: + +.. code-block:: c + + #define UACCESS_BUFFER_FLAG_WRITE 1 + +This flag is set if the access was a write, or clear if it was a +read. The meaning of all other flag bits is reserved. + +When entering the kernel with a non-zero uaccess descriptor +address for a reason other than a syscall (for example, when +IPI'd due to an incoming asynchronous signal), any signals other +than ``SIGKILL`` and ``SIGSTOP`` are masked as if by calling +``sigprocmask(SIG_SETMASK, set, NULL)`` where ``set`` has been +initialized with ``sigfillset(set)``. This is to prevent incoming +signals from interfering with uaccess logging. + +Example +------- + +Here is an example of a code snippet that will enumerate the accesses +performed by a ``uname(2)`` syscall: + +.. code-block:: c + + struct uaccess_buffer_entry entries[64]; + struct uaccess_descriptor desc; + uint64_t desc_addr = 0; + prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR, &desc_addr, 0, 0, 0); + + desc.addr = (uint64_t)&entries; + desc.size = 64; + desc_addr = (uint64_t)&desc; + + struct utsname un; + uname(&un); + + struct uaccess_buffer_entry* entries_end = (struct uaccess_buffer_entry*)desc.addr; + for (struct uaccess_buffer_entry* entry = entries; entry != entries_end; ++entry) { + printf("%s at 0x%lx size 0x%lx\n", entry->flags & UACCESS_BUFFER_FLAG_WRITE ? "WRITE" : "READ", + (unsigned long)entry->addr, (unsigned long)entry->size); + } + +Limitations +----------- + +This feature is currently only supported on the arm64, s390 and x86 +architectures. + +Uaccess buffers are a "best-effort" mechanism for logging uaccesses. Of +course, not all of the accesses may fit in the buffer, but aside from +that, not all internal kernel APIs that access userspace memory are +covered. Therefore, userspace programs should tolerate unreported +accesses. + +On the other hand, the kernel guarantees that it will not +(intentionally) report accessing more data than it is specified +to read. For example, if the kernel implements a syscall that is +specified to read a data structure of size ``N`` bytes by first +reading a page's worth of data and then only using the first ``N`` +bytes from it, the kernel will either report reading ``N`` bytes or +not report the access at all.