From patchwork Tue Dec 12 23:17:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 13490091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BE0AC4332F for ; Tue, 12 Dec 2023 23:17:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 406AA6B03E3; Tue, 12 Dec 2023 18:17:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 391726B03E5; Tue, 12 Dec 2023 18:17:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 195226B03E6; Tue, 12 Dec 2023 18:17:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F3DD26B03E3 for ; Tue, 12 Dec 2023 18:17:21 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D1581160A5F for ; Tue, 12 Dec 2023 23:17:21 +0000 (UTC) X-FDA: 81559729482.18.0535E30 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf24.hostedemail.com (Postfix) with ESMTP id E4D48180019 for ; Tue, 12 Dec 2023 23:17:19 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=ICNE+70r; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf24.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.216.47 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702423040; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oTvy78zmaIXZsTIPkmebRTxUrUOjQ/CWIjSmsuuygoc=; b=GLq0tEnREQCyyW94JzykGqYTstxL+lMM2PlQy9SYvTFY/4aOQCUiEDkwNuC8D2LMVwYio3 poFWHAJiwozpJFzGfQcdomVj24sa/b08e0YxjKhcHYlqEgCNqWP7bUhZngbLWBBRVd6nW0 flj6KLDfy2oEMkdTyW1P/sX2hE0QvDM= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=ICNE+70r; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf24.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.216.47 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702423040; a=rsa-sha256; cv=none; b=n1PbWoeAV6YDE4xzrV2sGhYjs6gjH3YNAggFhq5APz8NhaViwx6/h3gybdE60szDuZy8Oe r2ljaORhNbGISyuJYm2UIUbEMLHU66cxPWFw1trd1yGZj/FYb+z3aE6bS01mNB3Uivq2I3 MFvJKNRSAJEAYI8bHcJXaKToqDf8Lhg= Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-28659348677so4893261a91.0 for ; Tue, 12 Dec 2023 15:17:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423039; x=1703027839; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oTvy78zmaIXZsTIPkmebRTxUrUOjQ/CWIjSmsuuygoc=; b=ICNE+70rSkLKaSPEcfK8DbOIsQ3mg42+VgafzXl4Lrzj9AZRDbtXa6/0oW5hGexiFS Gu3e2PZmz4OdTg/SdnKYmJUbquNtbAD0b5KqqRrDIZ2TEj/7ASC67dbjsdt7gk2C72UJ 7NxhczoAUCmZ2QiowR2CshlVVK4UBqAgbxsyI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423039; x=1703027839; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oTvy78zmaIXZsTIPkmebRTxUrUOjQ/CWIjSmsuuygoc=; b=q4gNmap+qIrIsOQcgdir7WOdikzZKcSBCncXgqXNGCqgd8JhwYAumHyPM/2o/zzj2E 3dCfPDeI58QpG39gxwvS67K9fSyNbpxRqBzluxzpxduJqNtpkqqB0hCpop2qWqwgCB2W BUwzsNQv7ORsltMorHVhG15BWaJvBo0Jt/q/CdpI26hf8dLbCEcxmKXFPEazxkysNdqa iH8ouRzgU/8AVgBgMKFMMTvDCGEZr4pESUgZM7nKlDumzQqySAKvvSDhpUQpHMw57Wkg PweewyZkpRoA32y+gtLMOIM+6AKaHToPW13/abErWIkhpE0QL6BYvLrIktrfGZP3OWN7 8qEg== X-Gm-Message-State: AOJu0Yy5vU99maQIJ+uNjfDVmwO31Ex5bVlWBF4GCnmAQ6R7LOUtJz8I rRmvFrgUcJ2IwfiehYYHE3hA/w== X-Google-Smtp-Source: AGHT+IGWlFc4mkpXoFC0PEpzgfDHSLkC9hOefaTM1E9H1ZoIhebEpqs8jP0iWKY6qbWGYpKoD0aROg== X-Received: by 2002:a17:90a:7562:b0:28a:79b0:afc1 with SMTP id q89-20020a17090a756200b0028a79b0afc1mr5931086pjk.6.1702423038769; Tue, 12 Dec 2023 15:17:18 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id n20-20020a17090ade9400b00286a275d65asm11093878pjv.41.2023.12.12.15.17.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:18 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 08/11] mseal: add MM_SEAL_DISCARD_RO_ANON Date: Tue, 12 Dec 2023 23:17:02 +0000 Message-ID: <20231212231706.2680890-9-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E4D48180019 X-Stat-Signature: dj197ddi346rjosumi7au1joyo5r8qwe X-Rspam-User: X-HE-Tag: 1702423039-434073 X-HE-Meta: U2FsdGVkX19WrDoL3mPfzBxba9f3UOBO8TqT+ss7PZjQcbzIFTCghPHPf62vuphw8YcQo5ACZUB1Q3Y0ChmtkBTrXxrsr0AShwj+uGUngqcmmj0V/4UMvPn77uTvkKFh1Yg7NCO7mlEbGXJAG+I/KlQMQ9YDuFK2C87qM0rP2DkjY/aaCq4plR9Wp87f2anF+EaLy6/nNcEfJ90Uk4gxTsSLN8mDTPEKij58WFTp8gZF2pYwXvIuXv6UuJQhp0SVXsh7wQrfyzszZP5PFqK0Mkmm+8zSHAGwtwNLnxw8VJV5+SNksNQNLE0hYLqI2Mu5x9XOB38jJGrD21qGP9SNPK1tsnVsIYi9yGqxL3BP6mP93dOiES0WtJW/OLz6eXtxDHtqNvVDty3enH+oWqJvnAlXF7QqBF3JUvyO6Rn1tEbnEck3kKs7HTy4wqA3nX7wpqrq4hIIMSWPTWvROtEf2cOL85UnKEGqMtsFxxXoBqDuQ/coXWt/8PjP8PwBKELJaGu2QJoBQsWaXFDQNqZBl5vB9oH0Cue1UvePQ/2YtbNOGLE605cx2OeG0y2jrbnccY0MIGWWd7a8UTNZp99I8tfKl4bbKd588eu9sZ73Oi4BStr9Loy8LKAy8IuMGJ09ey0PylhU2Yw0CRyrE6XnitjSAY6VNqw9ia/wjNOKpVyn8+q4T72yZiS3ISrC5xH6QSV9MLxa0Yn2tAxXvPRjNa7ZwRxZxTeCZX2VcKeSVpSV4ZjcSHHZQD/tAQPuPFl2i11UGoX8Vrf+cyORrPolxzuV1QDWkN75sB1RPO9HPBFaVzvbjsDU3ndNQm/ZCWs9rToTYOFXeYIQkY03PVvswdFL3HDzo5KKczSQLEWRFmCp9O+EYp56IJrwJHMslhXAtaBGWvaa8tfQDe5+/FDW8e7y6rQrb7TDyG7LsJFmqdPOCNX53Xa9WUPgvNJnkTr5T2d+VbVIL4hr2i2TqgN uWvXeKoW u8kucqP1cqX1YcscPHezAvTQrnD6LdDPH5RiRU4PQpl0FWV6xlq1NsVZvarXZwdzyNHKyPdh56NY2arE4fMWa3hOXLwNVYcwrOdOnb4V+MOryiEquFfcna46PCpG8kbWsFj/8ItwMVs11YmYVtCvGIz0rST0R3dU3bd6tfWi3WJXJ4+hUlfgVCT6QufdITIzgvd690hjklyizR3sezOQtqBPNc8K/8kYEARdZJxOOMEhtoix0hvcSZ3mtM149j4FfSv/OGmFJi5T1PDLWDWcVrPj6pzII5FEV0/KqLDxC+yEEquWktJ326KsPEZpilBXzo5J7d9nOTAxht9cQ7G/uaZUYERXTKPJypSpFGxTPgV6sZKXCZ1nAzMBQpkctpzTKTITpc27GB45/E61ih0ooVFsFi1y1rKN8AcCALEVggDM+hcmMWPEYui/6MWxdz2qeaUQ/Aec3DgUYGLxTvYU5ceZHdzLRK9Pub7pzPiBXSR7xngI4ZZ4sjWOrVOEoDy4Gjh+1bEAaU8aBEhW3V6RzUfPSmLsXlgA0GmWe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Jeff Xu Certain types of madvise() operations are destructive, such as MADV_DONTNEED, which can effectively alter region contents by discarding pages, especially when memory is anonymous. This blocks such operations for anonymous memory which is not writable to the user. The MM_SEAL_DISCARD_RO_ANON blocks such operations if users don't have access to the memory, and the memory is anonymous memory. We do not think such sealing is useful for file-backed mapping because it should repopulate the memory contents from the underlying mapped file. We also do not think it is useful if the user can write to the memory because then the attacker can also write. Signed-off-by: Jeff Xu Suggested-by: Jann Horn Suggested-by: Stephen Röttger --- include/linux/mm.h | 19 +++++-- include/uapi/asm-generic/mman-common.h | 2 + include/uapi/linux/mman.h | 1 + mm/madvise.c | 12 +++++ mm/mseal.c | 73 ++++++++++++++++++++++++-- 5 files changed, 98 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f162bb5b38d..50dda474acc2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -264,7 +264,8 @@ extern unsigned int kobjsize(const void *objp); #define MM_SEAL_ALL ( \ MM_SEAL_SEAL | \ MM_SEAL_BASE | \ - MM_SEAL_PROT_PKEY) + MM_SEAL_PROT_PKEY | \ + MM_SEAL_DISCARD_RO_ANON) /* * PROT_SEAL_ALL is all supported flags in mmap(). @@ -273,7 +274,8 @@ extern unsigned int kobjsize(const void *objp); #define PROT_SEAL_ALL ( \ PROT_SEAL_SEAL | \ PROT_SEAL_BASE | \ - PROT_SEAL_PROT_PKEY) + PROT_SEAL_PROT_PKEY | \ + PROT_SEAL_DISCARD_RO_ANON) /* * vm_flags in vm_area_struct, see mm_types.h. @@ -3354,6 +3356,9 @@ extern bool can_modify_mm(struct mm_struct *mm, unsigned long start, extern bool can_modify_vma(struct vm_area_struct *vma, unsigned long checkSeals); +extern bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, + unsigned long end, int behavior); + /* * Convert prot field of mmap to vm_seals type. */ @@ -3362,9 +3367,9 @@ static inline unsigned long convert_mmap_seals(unsigned long prot) unsigned long seals = 0; /* - * set SEAL_PROT_PKEY implies SEAL_BASE. + * set SEAL_PROT_PKEY or SEAL_DISCARD_RO_ANON implies SEAL_BASE. */ - if (prot & PROT_SEAL_PROT_PKEY) + if (prot & (PROT_SEAL_PROT_PKEY | PROT_SEAL_DISCARD_RO_ANON)) prot |= PROT_SEAL_BASE; /* @@ -3407,6 +3412,12 @@ static inline bool can_modify_vma(struct vm_area_struct *vma, return true; } +static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, + unsigned long end, int behavior) +{ + return true; +} + static inline void update_vma_seals(struct vm_area_struct *vma, unsigned long vm_seals) { } diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index f07ad9e70b3a..bf503962409a 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -29,6 +29,8 @@ #define PROT_SEAL_SEAL _BITUL(PROT_SEAL_BIT_BEGIN) /* 0x04000000 seal seal */ #define PROT_SEAL_BASE _BITUL(PROT_SEAL_BIT_BEGIN + 1) /* 0x08000000 base for all sealing types */ #define PROT_SEAL_PROT_PKEY _BITUL(PROT_SEAL_BIT_BEGIN + 2) /* 0x10000000 seal prot and pkey */ +/* seal destructive madvise for non-writeable anonymous memory. */ +#define PROT_SEAL_DISCARD_RO_ANON _BITUL(PROT_SEAL_BIT_BEGIN + 3) /* 0x20000000 */ /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index f561652886c4..3872cc118c8a 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -58,5 +58,6 @@ struct cachestat { #define MM_SEAL_SEAL _BITUL(0) #define MM_SEAL_BASE _BITUL(1) #define MM_SEAL_PROT_PKEY _BITUL(2) +#define MM_SEAL_DISCARD_RO_ANON _BITUL(3) #endif /* _UAPI_LINUX_MMAN_H */ diff --git a/mm/madvise.c b/mm/madvise.c index e2d219a4b6ef..ff038e323779 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1403,6 +1403,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * -EIO - an I/O error occurred while paging in data. * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. + * -EACCES - memory is sealed. */ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) { @@ -1446,10 +1447,21 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh start = untagged_addr_remote(mm, start); end = start + len; + /* + * Check if the address range is sealed for do_madvise(). + * can_modify_mm_madv assumes we have acquired the lock on MM. + */ + if (!can_modify_mm_madv(mm, start, end, behavior)) { + error = -EACCES; + goto out; + } + blk_start_plug(&plug); error = madvise_walk_vmas(mm, start, end, behavior, madvise_vma_behavior); blk_finish_plug(&plug); + +out: if (write) mmap_write_unlock(mm); else diff --git a/mm/mseal.c b/mm/mseal.c index 3b90dce7d20e..294f48d33db6 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include "internal.h" @@ -66,6 +67,55 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end, return true; } +static bool is_madv_discard(int behavior) +{ + return behavior & + (MADV_FREE | MADV_DONTNEED | MADV_DONTNEED_LOCKED | + MADV_REMOVE | MADV_DONTFORK | MADV_WIPEONFORK); +} + +static bool is_ro_anon(struct vm_area_struct *vma) +{ + /* check anonymous mapping. */ + if (vma->vm_file || vma->vm_flags & VM_SHARED) + return false; + + /* + * check for non-writable: + * PROT=RO or PKRU is not writeable. + */ + if (!(vma->vm_flags & VM_WRITE) || + !arch_vma_access_permitted(vma, true, false, false)) + return true; + + return false; +} + +/* + * Check if the vmas of a memory range are allowed to be modified by madvise. + * the memory ranger can have a gap (unallocated memory). + * return true, if it is allowed. + */ +bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end, + int behavior) +{ + struct vm_area_struct *vma; + + VMA_ITERATOR(vmi, mm, start); + + if (!is_madv_discard(behavior)) + return true; + + /* going through each vma to check. */ + for_each_vma_range(vmi, vma, end) + if (is_ro_anon(vma) && !can_modify_vma( + vma, MM_SEAL_DISCARD_RO_ANON)) + return false; + + /* Allow by default. */ + return true; +} + /* * Check if a seal type can be added to VMA. */ @@ -76,6 +126,12 @@ static bool can_add_vma_seals(struct vm_area_struct *vma, unsigned long newSeals (newSeals & ~(vma_seals(vma)))) return false; + /* + * For simplicity, we allow adding all sealing types during mmap or mseal. + * The actual sealing check will happen later during particular action. + * E.g. For MM_SEAL_DISCARD_RO_ANON, we always allow adding it, at the + * time madvice() call, we will check if the sealing condition isn't met. + */ return true; } @@ -225,15 +281,22 @@ static int apply_mm_seal(unsigned long start, unsigned long end, * mprotect() and pkey_mprotect() will be denied if the memory is * sealed with MM_SEAL_PROT_PKEY. * - * The MM_SEAL_SEAL - * MM_SEAL_SEAL denies adding a new seal for an VMA. - * * The kernel will remember which seal types are applied, and the * application doesn’t need to repeat all existing seal types in * the next mseal(). Once a seal type is applied, it can’t be * unsealed. Call mseal() on an existing seal type is a no-action, * not a failure. * + * MM_SEAL_DISCARD_RO_ANON: block some destructive madvice() + * behavior, such as MADV_DONTNEED, which can effectively + * alter gegion contents by discarding pages, block such + * operation if users don't have write access to the memory, and + * the memory is anonymous memory. + * Setting this implies MM_SEAL_BASE is also set. + * + * The MM_SEAL_SEAL + * MM_SEAL_SEAL denies adding a new seal for an VMA. + * * flags: reserved. * * return values: @@ -264,8 +327,8 @@ static int do_mseal(unsigned long start, size_t len_in, unsigned long types, struct mm_struct *mm = current->mm; size_t len; - /* MM_SEAL_BASE is set when other seal types are set. */ - if (types & MM_SEAL_PROT_PKEY) + /* MM_SEAL_BASE is set when other seal types are set */ + if (types & (MM_SEAL_PROT_PKEY | MM_SEAL_DISCARD_RO_ANON)) types |= MM_SEAL_BASE; if (!can_do_mseal(types, flags))