From patchwork Thu Apr 11 16:05:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 13626347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1768EC4345F for ; Thu, 11 Apr 2024 16:06:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DE046B009D; Thu, 11 Apr 2024 12:06:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9666F6B00BE; Thu, 11 Apr 2024 12:06:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76ABC6B00BF; Thu, 11 Apr 2024 12:06:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 50E2D6B009D for ; Thu, 11 Apr 2024 12:06:40 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1DC181C10F5 for ; Thu, 11 Apr 2024 16:06:40 +0000 (UTC) X-FDA: 81997728960.29.DE87E41 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf27.hostedemail.com (Postfix) with ESMTP id B38524001A for ; Thu, 11 Apr 2024 16:06:37 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jgfDWV+k; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of rppt@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712851598; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1oLeYW4kNqv356U1vzmep8d+NvSaDxFlgcEZlsaSopY=; b=bLDG4gyyLnG3r1upAtcwACChHIbN5VR6HB2F7HxbHlRh/Dekhk4C3oQpyH4EiD0lAFE3Ot l8C0lJdsQQmkijHs0RzjAzNKXy1WOtUKFlxuLll5207yN0jyLDqGwqvveSQSHJxDJf+PD1 l3kHysqIqBxZ5i7401Ih0XvMArSMPyo= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jgfDWV+k; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf27.hostedemail.com: domain of rppt@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712851598; a=rsa-sha256; cv=none; b=gXq0itPiPNTxxWVIUJV+7dQaDa1PngQz6tCSNqeCFuYlh3Qm94DFnsvCeCy/pkBnApko5N FflgkXCGMcEhsoZnPXJf2wBdvqgOUX5hqugxHy2PPHE3htxQQdSmogOaFtHP5EjgITqMwO nhO/8OTnY51englvNeRfzdFnIPTMFJ4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id BC1D7CE17B0; Thu, 11 Apr 2024 16:06:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5CEAFC113CD; Thu, 11 Apr 2024 16:06:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712851594; bh=FzXFJSmb6FPYptE+tJEPsseMYyGfozQQhnGqwpvvWSs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jgfDWV+kGnsVbXX4csXun1AXMBmRYtUhZyOYrmoXgqBXVZt0vYKNVnTNCAQEGOZf1 63C3Y2YATnhkd3fsJ7ynR6ExQsaOg5XSxMIi42VqOYQdnHjG3O79uTgW4oXlf1R9Gt FHiNX9BiBg6uFHUNBZ8IfTJGYVZSeoquZk16BDVZjM5CwBNMoFy+7Wi9H150mRg9JS CDcPK65or1p/QJwGKWJuUbTDXNT+joIdWuAt4AiweUmYb0xSdD5p6o/4lWaCTk1PdA MNcxCn3pWEjarSozP7y/oxjgXvkg26U7ZofIVv33IMc4avvZLsf4M7hXrB0PgcsNpD F9tI/iyfZnPYQ== From: Mike Rapoport To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Andy Lutomirski , Arnd Bergmann , Catalin Marinas , Christoph Hellwig , Helge Deller , Lorenzo Stoakes , Luis Chamberlain , Mark Rutland , Masami Hiramatsu , Mathieu Desnoyers , Michael Ellerman , Mike Rapoport , Palmer Dabbelt , Peter Zijlstra , Russell King , Song Liu , Steven Rostedt , Thomas Gleixner , Uladzislau Rezki , Will Deacon , bpf@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-trace-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages Date: Thu, 11 Apr 2024 19:05:25 +0300 Message-ID: <20240411160526.2093408-7-rppt@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240411160526.2093408-1-rppt@kernel.org> References: <20240411160526.2093408-1-rppt@kernel.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: B38524001A X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ecunimon83kjt5e175kxfkmffsg57dbu X-HE-Tag: 1712851597-330948 X-HE-Meta: U2FsdGVkX19/CkdF+KA/WDIYz74rQmFXkHPpCq49T6fIhn3ot0Y6NMqv48EPmJPRKaGGD0HH2Q7RqyRJUGl1erBRCzkpn4I0e9sSlfKOovLYOTUdcvzBgFDpfGlnmvOeBSQflzOFzPm9QcWrC4p+/yDjlXDg887JNY3kiRHHxolH5qBjeVTXYbsjZcJWmxCgh2dZKeUX3x0HyBFNgxcaUNOEPgv8JiXENkKGeLaSlTnjSXiS+V33CTaIbcR0HHDWYH3W3AJH6QOE0f0nDoBNwociezMLrCr6m/lROu3yGTr3+4JjH6z/AwcaPV1C8DHARXtKjdgx0HOpFbg2B/ROA5riB0MnmuedG1BCRyI52Uzx8kNYwHWzDbltc3WKOWpyXe5n9fMx2pvOcpU1kplJioh8MVHn076+5GUjwf3e9fLCZySFJheGbfte66iFgOvqvS1VsDSxcNqnfbCIkEuHe0eYIF9ilv97OwiOHW0CTSKHnsF8oXeTh4ECfG9x+SPa7YpeI0sylJvjzhRsQH3lkG3/ZmhP0+v0XjMUR2YZZKLbKRdvPoYlYRyJrc+qOgl7Jd5uri0iH2sj6GKQbkWslS6N7IYbLxNC8pzdQSW2lW9DHtca+ovMf3ZyPnU+5AcEL7cDkkYLKVmPdOUaF9sCqZEoA6JsOIpnzZjIs1tuVTfzH/28EXMzOZ5XLTjB30+pikSi6Sq/FEuZNvxnNTMVBZ6+52coBrRWrQWp7KRiU0id8mnhtfc7KU15dghgGPQl3AWBHQkl9hV4MvC5POvXAYp0PxHHJPQwW3wdUQHoo32FTU9GIAH/I8n3wTH3MxuQ3cQ8SrZNXdoaYz2edjdfTOm1Ym03ycsncS+Dq/AJ4LrUBc5J4XM9xs8XNYL/w2Me03+fzZ789C4Rkv41JpQlK/sSBaTrkeNyuY6tr90LomdsqYh0SdU3Jhiwcit2DPLCVvE2Jq6F8M/ebuQiLOi Ph+ZXIlE bj0J95Qtk8EjfWzoLPqTW5M/7zDKAD9YuHnE2YnhjIRcR3oW4olV1jyFN7Ke6s0yEu3rS+1W/stTTwGy0IXh9HK60VfdJs3sf4uCM/BBkoQ8j+Y4Ywnqc3Jlz6b86XbpVwjNqe4T5zai4a4JvzRC+nJsc2OPGCtFSSYWYiLE7eBXmD3V+oVF3uDoZXAxstSra/mmhmSd71SLIjTRj+WJLiWk5su+8E/8EacIJ1nhcu03cpsY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: "Mike Rapoport (IBM)" Using large pages to map text areas reduces iTLB pressure and improves performance. Extend execmem_alloc() with an ability to use PMD_SIZE'ed pages with ROX permissions as a cache for smaller allocations. To populate the cache, a writable large page is allocated from vmalloc with VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as ROX. Portions of that large page are handed out to execmem_alloc() callers without any changes to the permissions. When the memory is freed with execmem_free() it is invalidated again so that it won't contain stale instructions. The cache is enabled when an architecture sets EXECMEM_ROX_CACHE flag in definition of an execmem_range. Signed-off-by: Mike Rapoport (IBM) --- include/linux/execmem.h | 2 + mm/execmem.c | 267 ++++++++++++++++++++++++++++++++++++++-- 2 files changed, 262 insertions(+), 7 deletions(-) diff --git a/include/linux/execmem.h b/include/linux/execmem.h index 9d22999dbd7d..06f678e6fe55 100644 --- a/include/linux/execmem.h +++ b/include/linux/execmem.h @@ -77,12 +77,14 @@ struct execmem_range { /** * struct execmem_info - architecture parameters for code allocations + * @invalidate: set memory to contain invalid instructions * @ranges: array of parameter sets defining architecture specific * parameters for executable memory allocations. The ranges that are not * explicitly initialized by an architecture use parameters defined for * @EXECMEM_DEFAULT. */ struct execmem_info { + void (*invalidate)(void *ptr, size_t size, bool writable); struct execmem_range ranges[EXECMEM_TYPE_MAX]; }; diff --git a/mm/execmem.c b/mm/execmem.c index c920d2b5a721..716fba68ab0e 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -1,30 +1,88 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include +#include #include #include +#include + +#include "internal.h" + static struct execmem_info *execmem_info __ro_after_init; static struct execmem_info default_execmem_info __ro_after_init; -static void *__execmem_alloc(struct execmem_range *range, size_t size) +struct execmem_cache { + struct mutex mutex; + struct maple_tree busy_areas; + struct maple_tree free_areas; +}; + +static struct execmem_cache execmem_cache = { + .mutex = __MUTEX_INITIALIZER(execmem_cache.mutex), + .busy_areas = MTREE_INIT_EXT(busy_areas, MT_FLAGS_LOCK_EXTERN, + execmem_cache.mutex), + .free_areas = MTREE_INIT_EXT(free_areas, MT_FLAGS_LOCK_EXTERN, + execmem_cache.mutex), +}; + +static void execmem_cache_clean(struct work_struct *work) +{ + struct maple_tree *free_areas = &execmem_cache.free_areas; + struct mutex *mutex = &execmem_cache.mutex; + MA_STATE(mas, free_areas, 0, ULONG_MAX); + void *area; + + mutex_lock(mutex); + mas_for_each(&mas, area, ULONG_MAX) { + size_t size; + + if (!xa_is_value(area)) + continue; + + size = xa_to_value(area); + + if (IS_ALIGNED(size, PMD_SIZE) && IS_ALIGNED(mas.index, PMD_SIZE)) { + void *ptr = (void *)mas.index; + + mas_erase(&mas); + vfree(ptr); + } + } + mutex_unlock(mutex); +} + +static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean); + +static void execmem_invalidate(void *ptr, size_t size, bool writable) +{ + if (execmem_info->invalidate) + execmem_info->invalidate(ptr, size, writable); + else + memset(ptr, 0, size); +} + +static void *execmem_vmalloc(struct execmem_range *range, size_t size, + pgprot_t pgprot, unsigned long vm_flags) { bool kasan = range->flags & EXECMEM_KASAN_SHADOW; - unsigned long vm_flags = VM_FLUSH_RESET_PERMS; gfp_t gfp_flags = GFP_KERNEL | __GFP_NOWARN; + unsigned int align = range->alignment; unsigned long start = range->start; unsigned long end = range->end; - unsigned int align = range->alignment; - pgprot_t pgprot = range->pgprot; void *p; if (kasan) vm_flags |= VM_DEFER_KMEMLEAK; - p = __vmalloc_node_range(size, align, start, end, gfp_flags, - pgprot, vm_flags, NUMA_NO_NODE, + if (vm_flags & VM_ALLOW_HUGE_VMAP) + align = PMD_SIZE; + + p = __vmalloc_node_range(size, align, start, end, gfp_flags, pgprot, + vm_flags, NUMA_NO_NODE, __builtin_return_address(0)); if (!p && range->fallback_start) { start = range->fallback_start; @@ -44,6 +102,199 @@ static void *__execmem_alloc(struct execmem_range *range, size_t size) return NULL; } + return p; +} + +static int execmem_cache_add(void *ptr, size_t size) +{ + struct maple_tree *free_areas = &execmem_cache.free_areas; + struct mutex *mutex = &execmem_cache.mutex; + unsigned long addr = (unsigned long)ptr; + MA_STATE(mas, free_areas, addr - 1, addr + 1); + unsigned long lower, lower_size = 0; + unsigned long upper, upper_size = 0; + unsigned long area_size; + void *area = NULL; + int err; + + lower = addr; + upper = addr + size - 1; + + mutex_lock(mutex); + area = mas_walk(&mas); + if (area && xa_is_value(area) && mas.last == addr - 1) { + lower = mas.index; + lower_size = xa_to_value(area); + } + + area = mas_next(&mas, ULONG_MAX); + if (area && xa_is_value(area) && mas.index == addr + size) { + upper = mas.last; + upper_size = xa_to_value(area); + } + + mas_set_range(&mas, lower, upper); + area_size = lower_size + upper_size + size; + err = mas_store_gfp(&mas, xa_mk_value(area_size), GFP_KERNEL); + mutex_unlock(mutex); + if (err) + return -ENOMEM; + + return 0; +} + +static void *__execmem_cache_alloc(size_t size) +{ + struct maple_tree *free_areas = &execmem_cache.free_areas; + struct maple_tree *busy_areas = &execmem_cache.busy_areas; + MA_STATE(mas_free, free_areas, 0, ULONG_MAX); + MA_STATE(mas_busy, busy_areas, 0, ULONG_MAX); + struct mutex *mutex = &execmem_cache.mutex; + unsigned long addr, last, area_size = 0; + void *area, *ptr = NULL; + int err; + + mutex_lock(mutex); + mas_for_each(&mas_free, area, ULONG_MAX) { + area_size = xa_to_value(area); + if (area_size >= size) + break; + } + + if (area_size < size) + goto out_unlock; + + addr = mas_free.index; + last = mas_free.last; + + /* insert allocated size to busy_areas at range [addr, addr + size) */ + mas_set_range(&mas_busy, addr, addr + size - 1); + err = mas_store_gfp(&mas_busy, xa_mk_value(size), GFP_KERNEL); + if (err) + goto out_unlock; + + mas_erase(&mas_free); + if (area_size > size) { + /* + * re-insert remaining free size to free_areas at range + * [addr + size, last] + */ + mas_set_range(&mas_free, addr + size, last); + size = area_size - size; + err = mas_store_gfp(&mas_free, xa_mk_value(size), GFP_KERNEL); + if (err) { + mas_erase(&mas_busy); + goto out_unlock; + } + } + ptr = (void *)addr; + +out_unlock: + mutex_unlock(mutex); + return ptr; +} + +static int execmem_cache_populate(struct execmem_range *range, size_t size) +{ + unsigned long vm_flags = VM_FLUSH_RESET_PERMS | VM_ALLOW_HUGE_VMAP; + unsigned long start, end; + struct vm_struct *vm; + size_t alloc_size; + int err = -ENOMEM; + void *p; + + alloc_size = round_up(size, PMD_SIZE); + p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags); + if (!p) + return err; + + vm = find_vm_area(p); + if (!vm) + goto err_free_mem; + + /* fill memory with invalid instructions */ + execmem_invalidate(p, alloc_size, /* writable = */ true); + + start = (unsigned long)p; + end = start + alloc_size; + + vunmap_range_noflush(start, end); + flush_tlb_kernel_range(start, end); + + /* FIXME: handle direct map alias */ + + err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages, + PMD_SHIFT); + if (err) + goto err_free_mem; + + err = execmem_cache_add(p, alloc_size); + if (err) + goto err_free_mem; + + return 0; + +err_free_mem: + vfree(p); + return err; +} + +static void *execmem_cache_alloc(struct execmem_range *range, size_t size) +{ + void *p; + int err; + + p = __execmem_cache_alloc(size); + if (p) + return p; + + err = execmem_cache_populate(range, size); + if (err) + return NULL; + + return __execmem_cache_alloc(size); +} + +static bool execmem_cache_free(void *ptr) +{ + struct maple_tree *busy_areas = &execmem_cache.busy_areas; + struct mutex *mutex = &execmem_cache.mutex; + unsigned long addr = (unsigned long)ptr; + MA_STATE(mas, busy_areas, addr, addr); + size_t size; + void *area; + + mutex_lock(mutex); + area = mas_walk(&mas); + if (!area) { + mutex_unlock(mutex); + return false; + } + size = xa_to_value(area); + mas_erase(&mas); + mutex_unlock(mutex); + + execmem_invalidate(ptr, size, /* writable = */ false); + + execmem_cache_add(ptr, size); + + schedule_work(&execmem_cache_clean_work); + + return true; +} + +static void *__execmem_alloc(struct execmem_range *range, size_t size) +{ + bool use_cache = range->flags & EXECMEM_ROX_CACHE; + unsigned long vm_flags = VM_FLUSH_RESET_PERMS; + pgprot_t pgprot = range->pgprot; + void *p; + + if (use_cache) + p = execmem_cache_alloc(range, size); + else + p = execmem_vmalloc(range, size, pgprot, vm_flags); + return kasan_reset_tag(p); } @@ -61,7 +312,9 @@ void execmem_free(void *ptr) * supported by vmalloc. */ WARN_ON(in_interrupt()); - vfree(ptr); + + if (!execmem_cache_free(ptr)) + vfree(ptr); } void *execmem_update_copy(void *dst, const void *src, size_t size)