From patchwork Fri Apr 22 06:01:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 12822763 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6E0BC433F5 for ; Fri, 22 Apr 2022 06:01:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 524056B0073; Fri, 22 Apr 2022 02:01:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D31D6B0074; Fri, 22 Apr 2022 02:01:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 374956B0075; Fri, 22 Apr 2022 02:01:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 294586B0073 for ; Fri, 22 Apr 2022 02:01:25 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EE4B220A21 for ; Fri, 22 Apr 2022 06:01:24 +0000 (UTC) X-FDA: 79383467688.01.7DD5880 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf11.hostedemail.com (Postfix) with ESMTP id 80D8E4002E for ; Fri, 22 Apr 2022 06:01:22 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id s14so8607337plk.8 for ; Thu, 21 Apr 2022 23:01:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Bp+6Mtl+9X8bx0bqhBqFKs5a4ePa+PXv90jlJdkj1lk=; b=UtxoZ1JdqW3fcX9auDfKm8675JgAGikYBukOpzndjrb/5obHXJuS5bajo3sXY8v1h1 jdIy74NZnl4V+AALme7MfYZzkEitt4clzC8B/aHb21JiWCLOlJr1MzyFujlbhcolMOB0 /f+Y40z4gYEgNUq1xabve/eP3xOjbArMQKNZBgAqpe1U7exem/8jWVnWdr/PhuzIQya8 sdSKOx8UN1uFZ+SSbw7OxmJng07BsR9qWbPp43X0d9pm+Zy1O9P2pwpermzw8oT88QjE 5CX6sbVq/kvJ8U3nftnAXr207zwiWHZhZE4P88DijAZJ+8WpRGH4Y+OOdbEMYUp7KGKX tlXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Bp+6Mtl+9X8bx0bqhBqFKs5a4ePa+PXv90jlJdkj1lk=; b=REU8haws73ib5YOm2AKlUZYPTMv+DVsjanZ4KjjJ+IDR3W46GL4Bxwud1sqMu5FSSW 4oKa+wC8z0L/AFB5TK01HQn5yuGlockn6AmZtaBFefeCZE0UOnIQSki+EaMrKCFmQgpm YO+8Th1j/nC9nVBoYwFpGMO8rM3mVLLIP+QdaZed3kPWuOx1gEepKnjkJRGQ9ZRNNGSo KgQlZbSAB/yW1bWuxTvISSPTkGdYUDfSLHnB/8sd0K4jvRF4ZLRKm3hWIIQ/zTM8lMI/ qUb9aP8g5hD5MBllQMBwLQhOy46dCRiJg8Ft/jDAHOISL3+CcMorMUjF5PiEBoxzOPpH LowQ== X-Gm-Message-State: AOAM530H/7OyzmtTm437yjKffVO45LBBcE3GgAWauPvv3drJw9fpMb5K G5UQReayHTYyJ5FsOXu2qDA= X-Google-Smtp-Source: ABdhPJyE7r4GoOePw5giA255Y5FX9aOWvsph11ITy41jRBpWLTDZLqaV65pwzE9oS2A13aEn9VlUjw== X-Received: by 2002:a17:90a:4e08:b0:1cb:a3ac:938b with SMTP id n8-20020a17090a4e0800b001cba3ac938bmr14264027pjh.112.1650607283621; Thu, 21 Apr 2022 23:01:23 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (193-116-116-20.tpgi.com.au. [193.116.116.20]) by smtp.gmail.com with ESMTPSA id y16-20020a637d10000000b00381268f2c6fsm998607pgc.4.2022.04.21.23.01.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Apr 2022 23:01:23 -0700 (PDT) From: Nicholas Piggin To: Paul Menzel Cc: Nicholas Piggin , x86@kernel.org, Song Liu , "Edgecombe, Rick P" , "Torvalds, Linus" , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 1/2] mm/vmalloc: huge vmalloc backing pages should be split rather than compound Date: Fri, 22 Apr 2022 16:01:05 +1000 Message-Id: <20220422060107.781512-2-npiggin@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220422060107.781512-1-npiggin@gmail.com> References: <20220422060107.781512-1-npiggin@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 80D8E4002E X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=UtxoZ1Jd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of npiggin@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=npiggin@gmail.com X-Stat-Signature: d8b4ch9d6kogf9iggrz5f7hg6kb8re8e X-HE-Tag: 1650607282-478224 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge vmalloc higher-order backing pages were allocated with __GFP_COMP in order to allow the sub-pages to be refcounted by callers such as "remap_vmalloc_page [sic]" (remap_vmalloc_range). However a similar problem exists for other struct page fields callers use, for example fb_deferred_io_fault() takes a vmalloc'ed page and not only refcounts it but uses ->lru, ->mapping, ->index. This is not compatible with compound sub-pages. The correct approach is to use split high-order pages for the huge vmalloc backing. These allow callers to treat them in exactly the same way as individually-allocated order-0 pages. Signed-off-by: Nicholas Piggin --- mm/vmalloc.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 07da85ae825b..cadfbb5155ea 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2653,15 +2653,18 @@ static void __vunmap(const void *addr, int deallocate_pages) vm_remove_mappings(area, deallocate_pages); if (deallocate_pages) { - unsigned int page_order = vm_area_page_order(area); - int i, step = 1U << page_order; + int i; - for (i = 0; i < area->nr_pages; i += step) { + for (i = 0; i < area->nr_pages; i++) { struct page *page = area->pages[i]; BUG_ON(!page); - mod_memcg_page_state(page, MEMCG_VMALLOC, -step); - __free_pages(page, page_order); + mod_memcg_page_state(page, MEMCG_VMALLOC, -1); + /* + * High-order allocs for huge vmallocs are split, so + * can be freed as an array of order-0 allocations + */ + __free_pages(page, 0); cond_resched(); } atomic_long_sub(area->nr_pages, &nr_vmalloc_pages); @@ -2914,12 +2917,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (nr != nr_pages_request) break; } - } else - /* - * Compound pages required for remap_vmalloc_page if - * high-order pages. - */ - gfp |= __GFP_COMP; + } /* High-order pages or fallback path if "bulk" fails. */ @@ -2933,6 +2931,15 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_node(nid, gfp, order); if (unlikely(!page)) break; + /* + * Higher order allocations must be able to be treated as + * indepdenent small pages by callers (as they can with + * small-page vmallocs). Some drivers do their own refcounting + * on vmalloc_to_page() pages, some use page->mapping, + * page->lru, etc. + */ + if (order) + split_page(page, order); /* * Careful, we allocate and map page-order pages, but @@ -2992,11 +2999,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, atomic_long_add(area->nr_pages, &nr_vmalloc_pages); if (gfp_mask & __GFP_ACCOUNT) { - int i, step = 1U << page_order; + int i; - for (i = 0; i < area->nr_pages; i += step) - mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, - step); + for (i = 0; i < area->nr_pages; i++) + mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1); } /* From patchwork Fri Apr 22 06:01:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 12822764 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50C7DC433F5 for ; Fri, 22 Apr 2022 06:01:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AEF806B0074; Fri, 22 Apr 2022 02:01:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA0246B0075; Fri, 22 Apr 2022 02:01:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 968046B0078; Fri, 22 Apr 2022 02:01:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 872096B0074 for ; Fri, 22 Apr 2022 02:01:29 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 665EE20A41 for ; Fri, 22 Apr 2022 06:01:29 +0000 (UTC) X-FDA: 79383467898.30.7410F96 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf01.hostedemail.com (Postfix) with ESMTP id EB7F040027 for ; Fri, 22 Apr 2022 06:01:26 +0000 (UTC) Received: by mail-pf1-f182.google.com with SMTP id y14so6265538pfe.10 for ; Thu, 21 Apr 2022 23:01:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iCtPEUmganS3WW6hgnqPl75q8FroRAr1REDTeJ7X0R0=; b=I478Ghar3KWh63quvKkgtK9ltlcIKE6xMB4aGC1ywBJ0fiu1rd4oq8FEIaZBiWG2Vy XdbNKsmyGSP2SzgBZMBPoMsVEnZqXfDub+b+tWxI5A3JPJ1L8SI5CvfK/RZ7oiWTdrTs FdMzKDGt5hJLtxYG4I0kYObvHpro/xHMdZ4fyK3XldXONdspWEsbo2tUWCQ57yPGDRlv 7gxnJJPW/H1V57G81YCRPeId/zxr4D7+CuH31t3RjSlJkRSqfQ8r39R4IMHRqDvvwP8D SWkGvKF3f1V902b3YuPg0HZJWLCsAMMmt9RzHu2kAgZqExJk0t8zhWPL0x2+MCIVs/Y0 WnjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iCtPEUmganS3WW6hgnqPl75q8FroRAr1REDTeJ7X0R0=; b=6KDFrbxQZLbRLpRvgaCNXKUqu6ezIfYqQXTxhdOGiELv+h2B828IFufrt5FF1MEeS/ g5Xp9jrsF7dPczW61G2FznBN2g7zDrpPkuTMeysDMCcZd+2DFrQRxoE6YxTFrPTGicna /fYL/5EmtPU+gNsfXulm8oFrLJZhxtqkhNth2a6yRYwUSWTcIwoeNzrwdLM7aqHxA2l1 XXDsQXNkLWYDOw2417cYlO++Cc7ncQNQ2n+OTK7/wY1DPVgANel2lFWdjoWfF3a3gqLT c96JqawSf8xId13tSITz/1b7+CQ52g2Mt7mtoLS0dgeCLTHKLivm1PjR//glMhKnaiOj beUg== X-Gm-Message-State: AOAM533evT2bEX+9yBvPKoQhb/EaYivFcL21WFnsv9EVPgLa2NHxDkqS qi4kNlVfp+g14D30naGJFtw= X-Google-Smtp-Source: ABdhPJychbw03HI4297NZJB2yHoXqNJYk8ferxJW1V428PntpJtqiVa6H3q8JY2D4MExZdVRlMhXOw== X-Received: by 2002:a05:6a00:2405:b0:4e1:5008:adcc with SMTP id z5-20020a056a00240500b004e15008adccmr3352801pfh.35.1650607287854; Thu, 21 Apr 2022 23:01:27 -0700 (PDT) Received: from bobo.ozlabs.ibm.com (193-116-116-20.tpgi.com.au. [193.116.116.20]) by smtp.gmail.com with ESMTPSA id y16-20020a637d10000000b00381268f2c6fsm998607pgc.4.2022.04.21.23.01.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Apr 2022 23:01:27 -0700 (PDT) From: Nicholas Piggin To: Paul Menzel Cc: Nicholas Piggin , x86@kernel.org, Song Liu , "Edgecombe, Rick P" , "Torvalds, Linus" , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 2/2] Revert "vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP" Date: Fri, 22 Apr 2022 16:01:06 +1000 Message-Id: <20220422060107.781512-3-npiggin@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220422060107.781512-1-npiggin@gmail.com> References: <20220422060107.781512-1-npiggin@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: EB7F040027 X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=I478Ghar; spf=pass (imf01.hostedemail.com: domain of npiggin@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: rxx3bk9tsxxp1ftnwdnf43nbr67x1c64 X-HE-Tag: 1650607286-593412 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This reverts commit 559089e0a93d44280ec3ab478830af319c56dbe3 The previous commit fixes huge vmalloc for drivers that use the vmalloc_to_page() struct pages. Signed-off-by: Nicholas Piggin --- arch/Kconfig | 6 ++++-- arch/powerpc/kernel/module.c | 2 +- arch/s390/kvm/pv.c | 7 ++++++- include/linux/vmalloc.h | 4 ++-- mm/vmalloc.c | 17 +++++++---------- 5 files changed, 20 insertions(+), 16 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 31c4fdc4a4ba..29b0167c088b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -854,8 +854,10 @@ config HAVE_ARCH_HUGE_VMAP # # Archs that select this would be capable of PMD-sized vmaps (i.e., -# arch_vmap_pmd_supported() returns true). The VM_ALLOW_HUGE_VMAP flag -# must be used to enable allocations to use hugepages. +# arch_vmap_pmd_supported() returns true), and they must make no assumptions +# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag +# can be used to prohibit arch-specific allocations from using hugepages to +# help with this (e.g., modules may require it). # config HAVE_ARCH_HUGE_VMALLOC depends on HAVE_ARCH_HUGE_VMAP diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index 97a76a8619fb..40a583e9d3c7 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool * too. */ return __vmalloc_node_range(size, 1, start, end, gfp, prot, - VM_FLUSH_RESET_PERMS, + VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index cc7c9599f43e..7f7c0d6af2ce 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -137,7 +137,12 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm) /* Allocate variable storage */ vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE); vlen += uv_info.guest_virt_base_stor_len; - kvm->arch.pv.stor_var = vzalloc(vlen); + /* + * The Create Secure Configuration Ultravisor Call does not support + * using large pages for the virtual memory area. + * This is a hardware limitation. + */ + kvm->arch.pv.stor_var = vmalloc_no_huge(vlen); if (!kvm->arch.pv.stor_var) goto out_err; return 0; diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index b159c2789961..3b1df7da402d 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -26,7 +26,7 @@ struct notifier_block; /* in notifier.h */ #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ #define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */ #define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */ -#define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */ +#define VM_NO_HUGE_VMAP 0x00000400 /* force PAGE_SIZE pte mapping */ #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \ !defined(CONFIG_KASAN_VMALLOC) @@ -153,7 +153,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align, const void *caller) __alloc_size(1); void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, int node, const void *caller) __alloc_size(1); -void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1); +void *vmalloc_no_huge(unsigned long size) __alloc_size(1); extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2); extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index cadfbb5155ea..09470361dc03 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3101,7 +3101,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, return NULL; } - if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) { + if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) { unsigned long size_per_node; /* @@ -3268,24 +3268,21 @@ void *vmalloc(unsigned long size) EXPORT_SYMBOL(vmalloc); /** - * vmalloc_huge - allocate virtually contiguous memory, allow huge pages - * @size: allocation size - * @gfp_mask: flags for the page level allocator + * vmalloc_no_huge - allocate virtually contiguous memory using small pages + * @size: allocation size * - * Allocate enough pages to cover @size from the page level + * Allocate enough non-huge pages to cover @size from the page level * allocator and map them into contiguous kernel virtual space. - * If @size is greater than or equal to PMD_SIZE, allow using - * huge pages for the memory * * Return: pointer to the allocated memory or %NULL on error */ -void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) +void *vmalloc_no_huge(unsigned long size) { return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, - gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, + GFP_KERNEL, PAGE_KERNEL, VM_NO_HUGE_VMAP, NUMA_NO_NODE, __builtin_return_address(0)); } -EXPORT_SYMBOL_GPL(vmalloc_huge); +EXPORT_SYMBOL(vmalloc_no_huge); /** * vzalloc - allocate virtually contiguous memory with zero fill