From patchwork Mon Oct 16 14:38:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 13423492 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02B6FCDB474 for ; Mon, 16 Oct 2023 14:38:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7623F8D00A5; Mon, 16 Oct 2023 10:38:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6EBF58D0001; Mon, 16 Oct 2023 10:38:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53F128D00A5; Mon, 16 Oct 2023 10:38:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3C9328D0001 for ; Mon, 16 Oct 2023 10:38:58 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 113FE1CAC2E for ; Mon, 16 Oct 2023 14:38:58 +0000 (UTC) X-FDA: 81351581556.11.3702B6C Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf10.hostedemail.com (Postfix) with ESMTP id 2D97BC001A for ; Mon, 16 Oct 2023 14:38:54 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=ZoX5Zgya; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf10.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.210.171 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697467135; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=OG3B8TvIG+IjDlegATi+HkokBbcU++AzacD+iwBqjGw=; b=1J+A3e7B0XSoCQsRVqybf38gBISQil7V5cYy2GfgC/tzHAhen90bkwXpb3+PQ2+6QWPH2w m0gYOiDFGSXklbjFqW61ui4reY0tN2EJgO1k/fMjPdIvuICdEAtjYyNcVJt9grdA+EIIOM Kf1HxpjoEHzeKruLM/J9gBfvcrhZVTg= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=ZoX5Zgya; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf10.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.210.171 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697467135; a=rsa-sha256; cv=none; b=nWnNTsbWNH+93Sa0RLN0s7Xfz47ka3dX/wAT/JGB4whlgdfZHNvlwecUrcnJdDEH8ndZTL +4tuqD1UCm6/0sFusBnDphWqeY1KkuOSdH3BhUvgtBdJYL7JLOuCJ/FjZMrnXbse1UNcOV fL7AnW0iPYwYrgRGrEHJ9HIKn7hKBkg= Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6b87c1edfd5so1656996b3a.1 for ; Mon, 16 Oct 2023 07:38:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1697467134; x=1698071934; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=OG3B8TvIG+IjDlegATi+HkokBbcU++AzacD+iwBqjGw=; b=ZoX5Zgya5XER6ZcFORagM4vl8hQTtdOHibO8e2mPpc69o7S4BYjj5GfaMfh9pC+4jW Q+AzFFJ1pmPdZ3HHRIqw/1gTQwdtuEKZqx+Haafmd0uqxBGPyO/LiS2I0XQmXxSUkznr wtfGIPB8kSyEdFdlwrIMWnpQCNhsWD5RtCclc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697467134; x=1698071934; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OG3B8TvIG+IjDlegATi+HkokBbcU++AzacD+iwBqjGw=; b=sAyWq1oVrWo8fH7jI2wmfWwNfAhnhc1kJgtRU8FUAOAaCB9i1x0Pip5l0MqIA1H62E IllpvIR6wEWsmcEkBBtMlk8fxb/NFLiqRIfM2gIbGztOUcO/CB21HBad5jppscUqbvya TfqwsVVsapCb22t4aXi9VLkt0ukA3S8lBdtY3iJ11Od+GS0nesWjXlDN0FmJSKvHqLtG AuFwwMg5YGLCmX5MW2rh21Aktp1NHhfBqmmHQKsTPAYTjolOTkQBe0VA9r95VcT7YgpS obKq/XhawRBau3LHxvWXXmsGdOWkivpQNrTMId6phHwsvBtl5HsYeoPui/hIKiuGRQm4 JWJQ== X-Gm-Message-State: AOJu0YwOK/LeBCRfZekq3Jm2953rfd+3dHQuKJtXoTxW/XqJY2GMYNEq T56qQZLJCMBdi00Y7THe5VuiRw== X-Google-Smtp-Source: AGHT+IH9aWlYzhBaTPVLTRWv5Tf9e7BovSSNCFQrVwMGVHwa/T0CfgnjC+Zci9bojL5+5c4/ATIZOA== X-Received: by 2002:a05:6a20:4287:b0:16b:aad0:effe with SMTP id o7-20020a056a20428700b0016baad0effemr27827286pzj.62.1697467133493; Mon, 16 Oct 2023 07:38:53 -0700 (PDT) Received: from localhost (9.184.168.34.bc.googleusercontent.com. [34.168.184.9]) by smtp.gmail.com with UTF8SMTPSA id s20-20020a63af54000000b005b6c1972c99sm2697786pgo.7.2023.10.16.07.38.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Oct 2023 07:38:53 -0700 (PDT) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, torvalds@linux-foundation.org, lstoakes@gmail.com, willy@infradead.org, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Subject: [RFC PATCH v1 0/8] Introduce mseal() syscall Date: Mon, 16 Oct 2023 14:38:19 +0000 Message-ID: <20231016143828.647848-1-jeffxu@chromium.org> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog MIME-Version: 1.0 X-Rspamd-Queue-Id: 2D97BC001A X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: cg9n335cb8awuxpda3tppd7iitj56yrw X-HE-Tag: 1697467134-945302 X-HE-Meta: U2FsdGVkX1/s6wycHM1sH0BhzVz/ElAn4s33dlbFn5da+5CTuN0idHtuS4/dZnsg4XDup2l7kd38QUtnwwqA2dUPhzWNQ3cxlXIjxUUwF13fLUJlRfzP+8U6vTarfcHbLu8JkofLoZt2N3fhXOFToLIMuQriBwLAzL8at+CV3QjfHw4ZTSKIq6v6PJWhNNhstRHgpI9LhlvENgBcpoJmNATWJ0AVtGMNrRAKG9HewlGgHgTtN+J9M1dlTEsuxOU84xp6CEW33YiPGi16aBGBgPgT+nWpGf5FwUYYVoJvUTlc8VbgrnA62KigT2u/z/aSDMseXr0zQyqzvv3LB38STHUIrA9OVs6WlnHvwnjqOcmWoBze9JSsEERtbQBpqsi+JaEfBvtQY+zFEWKCUabtfnb/6vbHpoxpc9iEONhOSjM3A7rlGNRPLPhJYK4sCxx9WX2IePDOBafDPHzA2kZE6wNb5jL6IngkMp/3FAlhp2GgDl3yQ1QyhljH7NqKSOmoIgi2VnjfF5Gen3LbnjIMmlOybVVAPcLwM0kcxF781bPLoHKIEFa6gGposH6i5cWOW2AiJR4wYtmctZeQq/gxztIQuuJxjhQ9O2LUeystVWdxSxZj2u2MBA7AUpJJsGVM7oRMIcXWMeXe1RdW2TOMmyl8r9Tx4QFyyu94hkHuHGL9acs3s1GkFHFDtTIIjftIMlLtVb3IN92luNulmGaOAYmJPZrwfzdZeUw3T4TxFcuN+AmZShrJSBt8LUhI3DLZ5yiLncOjg+4Sub/Kt2bw+U7+7qSBrlESITmiLAHWYrj4KLYc2S4YtDBBYCdkS30AmgYSKV5iEtnc2FZRcWfmRBrI3WppIFcnTmOrdR5L94xe2lhxGxLhFhLWH+fJhXFHatwR6T/Bk/4lnG2JZW3pyT6XAbtC574rakYrNLQb/w8zKCaLJjn3SgsPqixUpvE0fKhJMUyNVT7Det5dzZy TnkqHNrx +w7g+QuKQYQs+nC/X+QlJ1L/ypTuKqizy8T51sK4uDGYMQPx3V8qF9gJ1U4QaoSolGRmLLgodMo0pL8Ry1FKdY3i2e9gG8vedKOBq2sPK0xrLNoM1Um7ElMSQC05Yxs51jAkjkoMThtjmGZ18EjhJwBlzyh1bmMYKxasAZK9ZoklYyFqTOLwF91PvJeYlabpbtxwrD5FxpO72p4CBDMRyEXccFwNZnyTH24pPwPoFdjKya3H5yAa6OhAE2GtLwqjWYX09sK6AedDmHEpbW1+W1UUEyfxacbMWCxZVwunXmJGfFs4Q3HB6cLcZQlBcoE7nmrqXlRERHhj0OTIhIQ9rMhOWNUXqKj2RQRqiAiYGtnmXrV4f0nVhLJTfPFOMGBVr7DPS3+1MxzxMSMkdrhxQEo3DxiRG08OErdOuzlLpuulrOVWCVtuIZPAEUmcRMSNLVpaaV1JvfiReoxg6tEveSYV8W6NkIFHu+Y+tez92KmChXCZs38Pj6Yn90Vacmyvd4vil1rWd3GRNri4anUnWGVufcbebJttAHI888fF+zthMjBDGN0qpgh+z8A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jeff Xu This patchset proposes a new mseal() syscall for the Linux kernel. Modern CPUs support memory permissions such as RW and NX bits. Linux has supported NX since the release of kernel version 2.6.8 in August 2004 [1]. The memory permission feature improves security stance on memory corruption bugs, i.e. the attacker can’t just write to arbitrary memory and point the code to it, the memory has to be marked with X bit, or else an exception will happen. Memory sealing additionally protects the mapping itself against modifications. This is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management syscall. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped. Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4]. Also, Chrome wants to adopt this feature for their CFI work [2] and this patchset has been designed to be compatible with the Chrome use case. The new mseal() is an architecture independent syscall, and with following signature: mseal(void addr, size_t len, unsigned int types, unsigned int flags) addr/len: memory range. Must be continuous/allocated memory, or else mseal() will fail and no VMA is updated. For details on acceptable arguments, please refer to comments in mseal.c. Those are also fully covered by the selftest. types: bit mask to specify which syscall to seal, currently they are: MM_SEAL_MSEAL 0x1 MM_SEAL_MPROTECT 0x2 MM_SEAL_MUNMAP 0x4 MM_SEAL_MMAP 0x8 MM_SEAL_MREMAP 0x10 Each bit represents sealing for one specific syscall type, e.g. MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of bitmask is that the API is extendable, i.e. when needed, the sealing can be extended to madvise, mlock, etc. Backward compatibility is also easy. The kernel will remember which seal types are applied, and the application doesn’t need to repeat all existing seal types in the next mseal(). Once a seal type is applied, it can’t be unsealed. Call mseal() on an existing seal type is a no-action, not a failure. MM_SEAL_MSEAL will deny mseal() calls that try to add a new seal type. Internally, vm_area_struct adds a new field vm_seals, to store the bit masks. For the affected syscalls, such as mprotect, a check(can_modify_mm) for sealing is added, this usually happens at the early point of the syscall, before any update is made to VMAs. The effect of that is: if any of the VMAs in the given address range fails the sealing check, none of the VMA will be updated. It might be worth noting that this is different from the rest of mprotect(), where some updates can happen even when mprotect returns fail. Consider can_modify_mm only checks vm_seals in vm_area_struct, and it is not going deeper in the page table or updating any HW, success or none behavior might fit better here. I would like to listen to the community's feedback on this. The idea that inspired this patch comes from Stephen Röttger’s work in V8 CFI [5], Chrome browser in ChromeOS will be the first user of this API. In addition, Stephen is working on glibc change to add sealing support into the dynamic linker to seal all non-writable segments at startup. When that work is completed, all applications can automatically benefit from these new protections. [1] https://kernelnewbies.org/Linux_2_6_8 [2] https://v8.dev/blog/control-flow-integrity [3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274 [4] https://man.openbsd.org/mimmutable.2 [5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc Jeff Xu (8): Add mseal syscall Wire up mseal syscall mseal: add can_modify_mm and can_modify_vma mseal: seal mprotect mseal munmap mseal mremap mseal mmap selftest mm/mseal mprotect/munmap/mremap/mmap arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + fs/aio.c | 5 +- include/linux/mm.h | 55 +- include/linux/mm_types.h | 7 + include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/mman.h | 6 + ipc/shm.c | 3 +- kernel/sys_ni.c | 1 + mm/Kconfig | 8 + mm/Makefile | 1 + mm/internal.h | 4 +- mm/mmap.c | 49 +- mm/mprotect.c | 6 + mm/mremap.c | 19 +- mm/mseal.c | 328 +++++ mm/nommu.c | 6 +- mm/util.c | 8 +- tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/mseal_test.c | 1428 +++++++++++++++++++ 37 files changed, 1934 insertions(+), 28 deletions(-) create mode 100644 mm/mseal.c create mode 100644 tools/testing/selftests/mm/mseal_test.c