From patchwork Thu Aug 18 22:42:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12948146 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E395C00140 for ; Thu, 18 Aug 2022 23:22:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3F208D0002; Thu, 18 Aug 2022 19:22:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC7B88D0001; Thu, 18 Aug 2022 19:22:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D42548D0002; Thu, 18 Aug 2022 19:22:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C16D88D0001 for ; Thu, 18 Aug 2022 19:22:32 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9BB341206C9 for ; Thu, 18 Aug 2022 23:22:32 +0000 (UTC) X-FDA: 79814289744.01.5494587 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf03.hostedemail.com (Postfix) with ESMTP id 28638204D6 for ; Thu, 18 Aug 2022 23:13:13 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27IKheuh006117 for ; Thu, 18 Aug 2022 15:48:15 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j1jqgn4fa-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 18 Aug 2022 15:48:15 -0700 Received: from twshared32421.14.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 18 Aug 2022 15:48:14 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id D368BBBF66AF; Thu, 18 Aug 2022 15:42:35 -0700 (PDT) From: Song Liu To: , CC: , , , , , , , , Song Liu Subject: [RFC 1/5] vmalloc: introduce vmalloc_exec and vfree_exec Date: Thu, 18 Aug 2022 15:42:14 -0700 Message-ID: <20220818224218.2399791-2-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220818224218.2399791-1-song@kernel.org> References: <20220818224218.2399791-1-song@kernel.org> X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: hh2W-fzE0Z5-s61u2SYvDrc_c2gFER6s X-Proofpoint-GUID: hh2W-fzE0Z5-s61u2SYvDrc_c2gFER6s X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-18_16,2022-08-18_01,2022-06-22_01 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660864394; a=rsa-sha256; cv=none; b=xIUHQNZAYyxJUrF3ongAumrvrtAN1cGdG2osj9poYcBI0VGzTWkvZMaO4tNzYxPbWaEx/T hoZ9+vj9y0B+liXF+r+It4eJZ2KUM0YupWfq1P2JYrHPW4i+tEmdmwNqPr2HT3FL6Kj6Ev FbDf+yb2V+9Cs+kgfYKopyWTGSd445o= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf03.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660864394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g/jrtsgUBEfwbWy1lVjVgaUutHe1bGWNldpDsGmGZGo=; b=QD8XjgsMBNfGNOe/M1UviXlyX12NTZ7LnymxE0WDz0FgWbAz6bhxcwQs1Z1wmy9Bj4PISO YDeYZU0WGujMZ/DPTN7cn4+fumQfApey/6LOfMYf+wN3X3ONHOrWprrSgC7X2YltotPAWP 7VDGW8v6/s2HqFDkpf+JuEuEAHYB2p8= X-Rspamd-Server: rspam06 X-Rspam-User: X-Stat-Signature: op9yc3jy81hx716wsj1test39p6wu5no Authentication-Results: imf03.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf03.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" X-Rspamd-Queue-Id: 28638204D6 X-HE-Tag: 1660864393-201350 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a prototype to host dynamic kernel text (modules, BPF programs, etc.) with huge pages. This is similar to the proposal by Peter in [1]. A new tree of vmap_area, free_text_area_* tree, is introduced in addition to free_vmap_area_* and vmap_area_*. vmalloc_exec allocates pages from free_text_area_*. When there isn't enough space left in free_text_area_*, new PMD_SIZE page(s) is allocated from free_vmap_area_* and added to free_text_area_*. The new tree allows separate handling of < PAGE_SIZE allocations, as current vmalloc code mostly assumes PAGE_SIZE aligned allocations. This version of vmalloc_exec can handle bpf programs, which uses 64 byte aligned allocations), and modules, which uses PAGE_SIZE aligned allocations. [1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/ --- include/linux/vmalloc.h | 4 + mm/nommu.c | 7 ++ mm/vmalloc.c | 163 +++++++++++++++++++++++++++++++++------- 3 files changed, 147 insertions(+), 27 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 096d48aa3437..691c02ffe3db 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -35,6 +35,8 @@ struct notifier_block; /* in notifier.h */ #define VM_DEFER_KMEMLEAK 0 #endif +#define VM_KERNEL_EXEC 0x00001000 /* kernel text mapped as RO+X */ + /* bits [20..32] reserved for arch specific ioremap internals */ /* @@ -154,6 +156,8 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align, void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, int node, const void *caller) __alloc_size(1); void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1); +void *vmalloc_exec(unsigned long size, unsigned long align) __alloc_size(1); +void vfree_exec(const void *addr); extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2); extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2); diff --git a/mm/nommu.c b/mm/nommu.c index 9d7afc2d959e..11e0fc996006 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -372,6 +372,13 @@ int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages, } EXPORT_SYMBOL(vm_map_pages_zero); +void *vmalloc_exec(unsigned long size, unsigned long align) +{ + return NULL; +} + +void vfree_exec(const void *addr) { } + /* * sys_brk() for the most part doesn't need the global kernel * lock, except when an application is doing something nasty diff --git a/mm/vmalloc.c b/mm/vmalloc.c index effd1ff6a4b4..472287e71bf1 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -753,6 +753,10 @@ static LIST_HEAD(free_vmap_area_list); */ static struct rb_root free_vmap_area_root = RB_ROOT; +static DEFINE_SPINLOCK(free_text_area_lock); +static LIST_HEAD(free_text_area_list); +static struct rb_root free_text_area_root = RB_ROOT; + /* * Preload a CPU with one object for "no edge" split case. The * aim is to get rid of allocations from the atomic context, thus @@ -814,9 +818,11 @@ static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr) return va; } -static struct vmap_area *__find_vmap_area(unsigned long addr) +static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_node *root) { - struct rb_node *n = vmap_area_root.rb_node; + struct rb_node *n; + + n = root ? root : vmap_area_root.rb_node; addr = (unsigned long)kasan_reset_tag((void *)addr); @@ -926,7 +932,7 @@ link_va(struct vmap_area *va, struct rb_root *root, /* Insert to the rb-tree */ rb_link_node(&va->rb_node, parent, link); - if (root == &free_vmap_area_root) { + if (root == &free_vmap_area_root || root == &free_text_area_root) { /* * Some explanation here. Just perform simple insertion * to the tree. We do not set va->subtree_max_size to @@ -955,7 +961,7 @@ unlink_va(struct vmap_area *va, struct rb_root *root) if (WARN_ON(RB_EMPTY_NODE(&va->rb_node))) return; - if (root == &free_vmap_area_root) + if (root == &free_vmap_area_root || root == &free_text_area_root) rb_erase_augmented(&va->rb_node, root, &free_vmap_area_rb_augment_cb); else @@ -1198,15 +1204,15 @@ is_within_this_va(struct vmap_area *va, unsigned long size, * overhead. */ static __always_inline struct vmap_area * -find_vmap_lowest_match(unsigned long size, unsigned long align, - unsigned long vstart, bool adjust_search_size) +find_vmap_lowest_match(struct rb_node *root, unsigned long size, + unsigned long align, unsigned long vstart, bool adjust_search_size) { struct vmap_area *va; struct rb_node *node; unsigned long length; /* Start from the root. */ - node = free_vmap_area_root.rb_node; + node = root; /* Adjust the search size for alignment overhead. */ length = adjust_search_size ? size + align - 1 : size; @@ -1290,8 +1296,9 @@ find_vmap_lowest_match_check(unsigned long size, unsigned long align) get_random_bytes(&rnd, sizeof(rnd)); vstart = VMALLOC_START + rnd; - va_1 = find_vmap_lowest_match(size, align, vstart, false); - va_2 = find_vmap_lowest_linear_match(size, align, vstart); + va_1 = find_vmap_lowest_match(free_vmap_area_root.rb_node, size, + align, vstart, false); + va_2 = find_vmap_lowest_linear_match(root, size, align, vstart); if (va_1 != va_2) pr_emerg("not lowest: t: 0x%p, l: 0x%p, v: 0x%lx\n", @@ -1334,7 +1341,8 @@ classify_va_fit_type(struct vmap_area *va, } static __always_inline int -adjust_va_to_fit_type(struct vmap_area *va, +adjust_va_to_fit_type(struct rb_root *root, struct list_head *head, + struct vmap_area *va, unsigned long nva_start_addr, unsigned long size, enum fit_type type) { @@ -1348,7 +1356,7 @@ adjust_va_to_fit_type(struct vmap_area *va, * V NVA V * |---------------| */ - unlink_va(va, &free_vmap_area_root); + unlink_va(va, root); kmem_cache_free(vmap_area_cachep, va); } else if (type == LE_FIT_TYPE) { /* @@ -1426,8 +1434,7 @@ adjust_va_to_fit_type(struct vmap_area *va, augment_tree_propagate_from(va); if (lva) /* type == NE_FIT_TYPE */ - insert_vmap_area_augment(lva, &va->rb_node, - &free_vmap_area_root, &free_vmap_area_list); + insert_vmap_area_augment(lva, &va->rb_node, root, head); } return 0; @@ -1459,7 +1466,8 @@ __alloc_vmap_area(unsigned long size, unsigned long align, if (align <= PAGE_SIZE || (align > PAGE_SIZE && (vend - vstart) == size)) adjust_search_size = false; - va = find_vmap_lowest_match(size, align, vstart, adjust_search_size); + va = find_vmap_lowest_match(free_vmap_area_root.rb_node, + size, align, vstart, adjust_search_size); if (unlikely(!va)) return vend; @@ -1478,7 +1486,8 @@ __alloc_vmap_area(unsigned long size, unsigned long align, return vend; /* Update the free vmap_area. */ - ret = adjust_va_to_fit_type(va, nva_start_addr, size, type); + ret = adjust_va_to_fit_type(&free_vmap_area_root, &free_vmap_area_list, + va, nva_start_addr, size, type); if (ret) return vend; @@ -1539,7 +1548,7 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node) static struct vmap_area *alloc_vmap_area(unsigned long size, unsigned long align, unsigned long vstart, unsigned long vend, - int node, gfp_t gfp_mask) + int node, unsigned long vm_flags, gfp_t gfp_mask) { struct vmap_area *va; unsigned long freed; @@ -1583,9 +1592,17 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, va->va_end = addr + size; va->vm = NULL; - spin_lock(&vmap_area_lock); - insert_vmap_area(va, &vmap_area_root, &vmap_area_list); - spin_unlock(&vmap_area_lock); + if (vm_flags & VM_KERNEL_EXEC) { + spin_lock(&free_text_area_lock); + insert_vmap_area(va, &free_text_area_root, &free_text_area_list); + /* update subtree_max_size now as we need this soon */ + augment_tree_propagate_from(va); + spin_unlock(&free_text_area_lock); + } else { + spin_lock(&vmap_area_lock); + insert_vmap_area(va, &vmap_area_root, &vmap_area_list); + spin_unlock(&vmap_area_lock); + } BUG_ON(!IS_ALIGNED(va->va_start, align)); BUG_ON(va->va_start < vstart); @@ -1803,7 +1820,7 @@ struct vmap_area *find_vmap_area(unsigned long addr) struct vmap_area *va; spin_lock(&vmap_area_lock); - va = __find_vmap_area(addr); + va = __find_vmap_area(addr, vmap_area_root.rb_node); spin_unlock(&vmap_area_lock); return va; @@ -1912,8 +1929,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) return ERR_PTR(-ENOMEM); va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE, - VMALLOC_START, VMALLOC_END, - node, gfp_mask); + VMALLOC_START, VMALLOC_END, + node, 0, gfp_mask); if (IS_ERR(va)) { kfree(vb); return ERR_CAST(va); @@ -2209,8 +2226,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node) addr = (unsigned long)mem; } else { struct vmap_area *va; - va = alloc_vmap_area(size, PAGE_SIZE, - VMALLOC_START, VMALLOC_END, node, GFP_KERNEL); + va = alloc_vmap_area(size, PAGE_SIZE, VMALLOC_START, VMALLOC_END, + node, 0, GFP_KERNEL); if (IS_ERR(va)) return NULL; @@ -2450,7 +2467,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, if (!(flags & VM_NO_GUARD)) size += PAGE_SIZE; - va = alloc_vmap_area(size, align, start, end, node, gfp_mask); + va = alloc_vmap_area(size, align, start, end, node, flags, gfp_mask); if (IS_ERR(va)) { kfree(area); return NULL; @@ -2546,7 +2563,7 @@ struct vm_struct *remove_vm_area(const void *addr) might_sleep(); spin_lock(&vmap_area_lock); - va = __find_vmap_area((unsigned long)addr); + va = __find_vmap_area((unsigned long)addr, vmap_area_root.rb_node); if (va && va->vm) { struct vm_struct *vm = va->vm; @@ -3265,6 +3282,97 @@ void *vmalloc(unsigned long size) } EXPORT_SYMBOL(vmalloc); +void *vmalloc_exec(unsigned long size, unsigned long align) +{ + struct vmap_area *va, *tmp; + unsigned long addr; + enum fit_type type; + int ret; + + va = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, NUMA_NO_NODE); + if (unlikely(!va)) + return ERR_PTR(-ENOMEM); + +again: + preload_this_cpu_lock(&free_text_area_lock, GFP_KERNEL, NUMA_NO_NODE); + tmp = find_vmap_lowest_match(free_text_area_root.rb_node, + size, align, 1, false); + + if (!tmp) { + unsigned long alloc_size; + void *ptr; + + spin_unlock(&free_text_area_lock); + + alloc_size = roundup(size, PMD_SIZE * num_online_nodes()); + ptr = __vmalloc_node_range(alloc_size, PMD_SIZE, MODULES_VADDR, + MODULES_END, GFP_KERNEL, PAGE_KERNEL, + VM_KERNEL_EXEC | VM_ALLOW_HUGE_VMAP | VM_NO_GUARD, + NUMA_NO_NODE, __builtin_return_address(0)); + if (unlikely(!ptr)) { + ret = -ENOMEM; + goto err_out; + } + memset(ptr, 0, alloc_size); + set_memory_ro((unsigned long)ptr, alloc_size >> PAGE_SHIFT); + set_memory_x((unsigned long)ptr, alloc_size >> PAGE_SHIFT); + + goto again; + } + + addr = roundup(tmp->va_start, align); + type = classify_va_fit_type(tmp, addr, size); + if (WARN_ON_ONCE(type == NOTHING_FIT)) { + addr = -ENOMEM; + goto err_out; + } + + ret = adjust_va_to_fit_type(&free_text_area_root, &free_text_area_list, + tmp, addr, size, type); + if (ret) { + addr = ret; + goto err_out; + } + spin_unlock(&free_text_area_lock); + + va->va_start = addr; + va->va_end = addr + size; + va->vm = tmp->vm; + + spin_lock(&vmap_area_lock); + insert_vmap_area(va, &vmap_area_root, &vmap_area_list); + spin_unlock(&vmap_area_lock); + + return (void *)addr; + +err_out: + spin_unlock(&free_text_area_lock); + return ERR_PTR(ret); +} + +void vfree_exec(const void *addr) +{ + struct vmap_area *va; + + might_sleep(); + + spin_lock(&vmap_area_lock); + va = __find_vmap_area((unsigned long)addr, vmap_area_root.rb_node); + if (WARN_ON_ONCE(!va)) { + spin_unlock(&vmap_area_lock); + return; + } + + unlink_va(va, &vmap_area_root); + spin_unlock(&vmap_area_lock); + + spin_lock(&free_text_area_lock); + merge_or_add_vmap_area_augment(va, + &free_text_area_root, &free_text_area_list); + spin_unlock(&free_text_area_lock); + /* TODO: when the whole vm_struct is not in use, free it */ +} + /** * vmalloc_huge - allocate virtually contiguous memory, allow huge pages * @size: allocation size @@ -3851,7 +3959,8 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, /* It is a BUG(), but trigger recovery instead. */ goto recovery; - ret = adjust_va_to_fit_type(va, start, size, type); + ret = adjust_va_to_fit_type(&free_vmap_area_root, &free_vmap_area_list, + va, start, size, type); if (unlikely(ret)) goto recovery; From patchwork Thu Aug 18 22:42:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12948145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3910C00140 for ; Thu, 18 Aug 2022 23:18:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F64F8D0003; Thu, 18 Aug 2022 19:18:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A67A8D0002; Thu, 18 Aug 2022 19:18:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06E4D8D0003; Thu, 18 Aug 2022 19:18:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EAB718D0002 for ; Thu, 18 Aug 2022 19:18:18 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BD60CA47B8 for ; Thu, 18 Aug 2022 23:18:18 +0000 (UTC) X-FDA: 79814279076.30.A398515 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf09.hostedemail.com (Postfix) with ESMTP id EEB25141244 for ; Thu, 18 Aug 2022 23:03:19 +0000 (UTC) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27IKhaBH006309 for ; Thu, 18 Aug 2022 15:54:19 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3j1p0rv3sy-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 18 Aug 2022 15:54:19 -0700 Received: from twshared5413.23.frc3.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 18 Aug 2022 15:54:14 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 2C564BBF66C3; Thu, 18 Aug 2022 15:42:39 -0700 (PDT) From: Song Liu To: , CC: , , , , , , , , Song Liu Subject: [RFC 2/5] bpf: use vmalloc_exec Date: Thu, 18 Aug 2022 15:42:15 -0700 Message-ID: <20220818224218.2399791-3-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220818224218.2399791-1-song@kernel.org> References: <20220818224218.2399791-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: pAHNoqkl8i4JOyIYPjmoBDKfqPth5cai X-Proofpoint-GUID: pAHNoqkl8i4JOyIYPjmoBDKfqPth5cai X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-18_16,2022-08-18_01,2022-06-22_01 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660863800; a=rsa-sha256; cv=none; b=SwploYS5AtIt9EV77RCKk30Y67c/I+8A7aRlgFvxCd2oQzhb8aeGZXXdNvY6aUKOx5TQhh C+fVgn9RZKYM8KNSFu6w74A9O91zGa0BaLLzGu8wpe8G1SXSspJ9VpYJohloeediPsdl0+ fvknPpO8qzFH6UwDU3cUAVxG2HNa1qk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf09.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660863800; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uwWHJ4ZcDX4SP5HmDGOK5+ApqhqPFsZopKdG5F16ah0=; b=HPHoxoVG1KjLcsv54Thm0XYp9CjkPE04+e2f6UuHLeEqfWeVV53jr4bswvrMZuajgFtsPI HUMiHg+u2mPSiF/Qh7vFdQcO1e3tJUW7fqPVUa9z2HUdLIcx+5aItLygExIdL+bR18YfMP YOgQjB4neBT8QrrCBT1+DeDIM53Mrf4= Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf09.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" X-Rspam-User: X-Stat-Signature: 4ofwhhwxkiuqrju3hw6qc7s4kk19m3ub X-Rspamd-Queue-Id: EEB25141244 X-Rspamd-Server: rspam03 X-HE-Tag: 1660863799-697768 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use vmalloc_exec and vfree_exec instead of bpf_prog_pack_[alloc|free]. --- kernel/bpf/core.c | 155 +++------------------------------------------- 1 file changed, 10 insertions(+), 145 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index c1e10d088dbb..834cce7e1ef2 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -806,144 +806,6 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog, return slot; } -/* - * BPF program pack allocator. - * - * Most BPF programs are pretty small. Allocating a hole page for each - * program is sometime a waste. Many small bpf program also adds pressure - * to instruction TLB. To solve this issue, we introduce a BPF program pack - * allocator. The prog_pack allocator uses HPAGE_PMD_SIZE page (2MB on x86) - * to host BPF programs. - */ -#define BPF_PROG_CHUNK_SHIFT 6 -#define BPF_PROG_CHUNK_SIZE (1 << BPF_PROG_CHUNK_SHIFT) -#define BPF_PROG_CHUNK_MASK (~(BPF_PROG_CHUNK_SIZE - 1)) - -struct bpf_prog_pack { - struct list_head list; - void *ptr; - unsigned long bitmap[]; -}; - -#define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE) - -static DEFINE_MUTEX(pack_mutex); -static LIST_HEAD(pack_list); - -/* PMD_SIZE is not available in some special config, e.g. ARCH=arm with - * CONFIG_MMU=n. Use PAGE_SIZE in these cases. - */ -#ifdef PMD_SIZE -#define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes()) -#else -#define BPF_PROG_PACK_SIZE PAGE_SIZE -#endif - -#define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE) - -static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns) -{ - struct bpf_prog_pack *pack; - - pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)), - GFP_KERNEL); - if (!pack) - return NULL; - pack->ptr = module_alloc(BPF_PROG_PACK_SIZE); - if (!pack->ptr) { - kfree(pack); - return NULL; - } - bpf_fill_ill_insns(pack->ptr, BPF_PROG_PACK_SIZE); - bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE); - list_add_tail(&pack->list, &pack_list); - - set_vm_flush_reset_perms(pack->ptr); - set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); - set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); - return pack; -} - -static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns) -{ - unsigned int nbits = BPF_PROG_SIZE_TO_NBITS(size); - struct bpf_prog_pack *pack; - unsigned long pos; - void *ptr = NULL; - - mutex_lock(&pack_mutex); - if (size > BPF_PROG_PACK_SIZE) { - size = round_up(size, PAGE_SIZE); - ptr = module_alloc(size); - if (ptr) { - bpf_fill_ill_insns(ptr, size); - set_vm_flush_reset_perms(ptr); - set_memory_ro((unsigned long)ptr, size / PAGE_SIZE); - set_memory_x((unsigned long)ptr, size / PAGE_SIZE); - } - goto out; - } - list_for_each_entry(pack, &pack_list, list) { - pos = bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, - nbits, 0); - if (pos < BPF_PROG_CHUNK_COUNT) - goto found_free_area; - } - - pack = alloc_new_pack(bpf_fill_ill_insns); - if (!pack) - goto out; - - pos = 0; - -found_free_area: - bitmap_set(pack->bitmap, pos, nbits); - ptr = (void *)(pack->ptr) + (pos << BPF_PROG_CHUNK_SHIFT); - -out: - mutex_unlock(&pack_mutex); - return ptr; -} - -static void bpf_prog_pack_free(struct bpf_binary_header *hdr) -{ - struct bpf_prog_pack *pack = NULL, *tmp; - unsigned int nbits; - unsigned long pos; - - mutex_lock(&pack_mutex); - if (hdr->size > BPF_PROG_PACK_SIZE) { - module_memfree(hdr); - goto out; - } - - list_for_each_entry(tmp, &pack_list, list) { - if ((void *)hdr >= tmp->ptr && (tmp->ptr + BPF_PROG_PACK_SIZE) > (void *)hdr) { - pack = tmp; - break; - } - } - - if (WARN_ONCE(!pack, "bpf_prog_pack bug\n")) - goto out; - - nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size); - pos = ((unsigned long)hdr - (unsigned long)pack->ptr) >> BPF_PROG_CHUNK_SHIFT; - - WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size), - "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n"); - - bitmap_clear(pack->bitmap, pos, nbits); - if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, - BPF_PROG_CHUNK_COUNT, 0) == 0) { - list_del(&pack->list); - module_memfree(pack->ptr); - kfree(pack); - } -out: - mutex_unlock(&pack_mutex); -} - static atomic_long_t bpf_jit_current; /* Can be overridden by an arch's JIT compiler if it has a custom, @@ -1043,6 +905,9 @@ void bpf_jit_binary_free(struct bpf_binary_header *hdr) bpf_jit_uncharge_modmem(size); } +#define BPF_PROG_EXEC_ALIGN 64 +#define BPF_PROG_EXEC_MASK (~(BPF_PROG_EXEC_ALIGN - 1)) + /* Allocate jit binary from bpf_prog_pack allocator. * Since the allocated memory is RO+X, the JIT engine cannot write directly * to the memory. To solve this problem, a RW buffer is also allocated at @@ -1065,11 +930,11 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr, alignment > BPF_IMAGE_ALIGNMENT); /* add 16 bytes for a random section of illegal instructions */ - size = round_up(proglen + sizeof(*ro_header) + 16, BPF_PROG_CHUNK_SIZE); + size = round_up(proglen + sizeof(*ro_header) + 16, BPF_PROG_EXEC_ALIGN); if (bpf_jit_charge_modmem(size)) return NULL; - ro_header = bpf_prog_pack_alloc(size, bpf_fill_ill_insns); + ro_header = vmalloc_exec(size, BPF_PROG_EXEC_ALIGN); if (!ro_header) { bpf_jit_uncharge_modmem(size); return NULL; @@ -1078,7 +943,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr, *rw_header = kvmalloc(size, GFP_KERNEL); if (!*rw_header) { bpf_arch_text_copy(&ro_header->size, &size, sizeof(size)); - bpf_prog_pack_free(ro_header); + vfree_exec(ro_header); bpf_jit_uncharge_modmem(size); return NULL; } @@ -1088,7 +953,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr, (*rw_header)->size = size; hole = min_t(unsigned int, size - (proglen + sizeof(*ro_header)), - BPF_PROG_CHUNK_SIZE - sizeof(*ro_header)); + BPF_PROG_EXEC_ALIGN - sizeof(*ro_header)); start = (get_random_int() % hole) & ~(alignment - 1); *image_ptr = &ro_header->image[start]; @@ -1109,7 +974,7 @@ int bpf_jit_binary_pack_finalize(struct bpf_prog *prog, kvfree(rw_header); if (IS_ERR(ptr)) { - bpf_prog_pack_free(ro_header); + vfree_exec(ro_header); return PTR_ERR(ptr); } return 0; @@ -1130,7 +995,7 @@ void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header, { u32 size = ro_header->size; - bpf_prog_pack_free(ro_header); + vfree_exec(ro_header); kvfree(rw_header); bpf_jit_uncharge_modmem(size); } @@ -1141,7 +1006,7 @@ bpf_jit_binary_pack_hdr(const struct bpf_prog *fp) unsigned long real_start = (unsigned long)fp->bpf_func; unsigned long addr; - addr = real_start & BPF_PROG_CHUNK_MASK; + addr = real_start & BPF_PROG_EXEC_MASK; return (void *)addr; } From patchwork Thu Aug 18 22:42:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12948143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4213C00140 for ; Thu, 18 Aug 2022 23:14:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 21D928D0003; Thu, 18 Aug 2022 19:14:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CB888D0002; Thu, 18 Aug 2022 19:14:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 093E38D0003; Thu, 18 Aug 2022 19:14:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EE8CE8D0002 for ; Thu, 18 Aug 2022 19:14:54 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C0897120B9F for ; Thu, 18 Aug 2022 23:14:54 +0000 (UTC) X-FDA: 79814270508.01.C68D107 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf04.hostedemail.com (Postfix) with ESMTP id A804C401D6 for ; Thu, 18 Aug 2022 22:54:20 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27IKheqB006118 for ; Thu, 18 Aug 2022 15:54:14 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j1jqgn5ey-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 18 Aug 2022 15:54:14 -0700 Received: from twshared14818.18.frc3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 18 Aug 2022 15:54:12 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 5D83EBBF66CC; Thu, 18 Aug 2022 15:42:42 -0700 (PDT) From: Song Liu To: , CC: , , , , , , , , Song Liu Subject: [RFC 3/5] modules, x86: use vmalloc_exec for module core Date: Thu, 18 Aug 2022 15:42:16 -0700 Message-ID: <20220818224218.2399791-4-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220818224218.2399791-1-song@kernel.org> References: <20220818224218.2399791-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: iQCPtlMqRG3153osahwzodV4OaGXv0ej X-Proofpoint-GUID: iQCPtlMqRG3153osahwzodV4OaGXv0ej X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-18_16,2022-08-18_01,2022-06-22_01 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660863261; a=rsa-sha256; cv=none; b=6MiYJiJeF6BeFXo63oADhWdioFHHS8/vJx+fsHOipB7cE95aE1hm+LeRW+t6ZIJtE8xxIE nVOIKHSd/k2DUEguTSCoAm8Z9MA1tkzEd2pEVeJACtlyaACi5MRvcsKq2MwUHln67Ch09W DUfGYx/GsM60b9NgMXf+dGqAooio/Q4= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf04.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660863261; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kzMOE4ve9MeqIDzqngl7GAA5/ChDQtivUL4IrFZy0DY=; b=g7eNhfx8TuiPJe1mlDIoad08XZWTlTJUs6ub09Hk7oaulKVLwgJandxXeUGqt9QB4IMdEm xyQhekWC+oLY02jokbZmZ8I0lGVPGcpOcxsstiOmiZfU4zi+hGIcNdJtuNzNK+8KpoSKsJ rq6XkSKwka6+u45ucRXuiFo9mTvKLBA= X-Rspamd-Server: rspam06 X-Rspam-User: X-Stat-Signature: djqfo4guntid9bre6gtur3awf15isk5m Authentication-Results: imf04.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf04.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" X-Rspamd-Queue-Id: A804C401D6 X-HE-Tag: 1660863260-690469 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a prototype that allows modules to share 2MB text pages with other modules and BPF programs. Current version only covers core_layout. --- arch/x86/Kconfig | 1 + arch/x86/kernel/alternative.c | 30 ++++++++++++++++++++++++------ arch/x86/kernel/module.c | 1 + kernel/module/main.c | 23 +++++++++++++---------- kernel/module/strict_rwx.c | 3 --- kernel/trace/ftrace.c | 3 ++- 6 files changed, 41 insertions(+), 20 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index fb5900e2c29a..e932bceb7f23 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -91,6 +91,7 @@ config X86 select ARCH_HAS_SET_DIRECT_MAP select ARCH_HAS_STRICT_KERNEL_RWX select ARCH_HAS_STRICT_MODULE_RWX + select ARCH_WANTS_MODULES_DATA_IN_VMALLOC if X86_64 select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE select ARCH_HAS_SYSCALL_WRAPPER select ARCH_HAS_UBSAN_SANITIZE_ALL diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 62f6b8b7c4a5..c83888ec232b 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -332,7 +332,13 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, DUMP_BYTES(insn_buff, insn_buff_sz, "%px: final_insn: ", instr); - text_poke_early(instr, insn_buff, insn_buff_sz); + if (system_state < SYSTEM_RUNNING) { + text_poke_early(instr, insn_buff, insn_buff_sz); + } else { + mutex_lock(&text_mutex); + text_poke(instr, insn_buff, insn_buff_sz); + mutex_unlock(&text_mutex); + } next: optimize_nops(instr, a->instrlen); @@ -503,7 +509,13 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) optimize_nops(bytes, len); DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr); DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr); - text_poke_early(addr, bytes, len); + if (system_state == SYSTEM_BOOTING) { + text_poke_early(addr, bytes, len); + } else { + mutex_lock(&text_mutex); + text_poke(addr, bytes, len); + mutex_unlock(&text_mutex); + } } } } @@ -568,7 +580,13 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end) if (len == insn.length) { DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr); DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr); - text_poke_early(addr, bytes, len); + if (unlikely(system_state == SYSTEM_BOOTING)) { + text_poke_early(addr, bytes, len); + } else { + mutex_lock(&text_mutex); + text_poke(addr, bytes, len); + mutex_unlock(&text_mutex); + } } } } @@ -609,7 +627,7 @@ void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end) */ DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr); DUMP_BYTES(((u8*)&poison), 4, "%px: repl: ", addr); - text_poke_early(addr, &poison, 4); + text_poke(addr, &poison, 4); } } @@ -791,7 +809,7 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start, /* Pad the rest with nops */ add_nops(insn_buff + used, p->len - used); - text_poke_early(p->instr, insn_buff, p->len); + text_poke(p->instr, insn_buff, p->len); } } extern struct paravirt_patch_site __start_parainstructions[], @@ -1698,7 +1716,7 @@ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void * struct text_poke_loc tp; if (unlikely(system_state == SYSTEM_BOOTING)) { - text_poke_early(addr, opcode, len); + text_poke(addr, opcode, len); return; } diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 100446ffdc1d..570af623e28f 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -229,6 +229,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, bool early = me->state == MODULE_STATE_UNFORMED; void *(*write)(void *, const void *, size_t) = memcpy; + early = false; if (!early) { write = text_poke; mutex_lock(&text_mutex); diff --git a/kernel/module/main.c b/kernel/module/main.c index 57fc2821be63..c51dafa1089a 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include "internal.h" @@ -1198,7 +1199,7 @@ static void free_module(struct module *mod) lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size); /* Finally, free the core (containing the module structure) */ - module_memfree(mod->core_layout.base); + vfree_exec(mod->core_layout.base); #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC vfree(mod->data_layout.base); #endif @@ -1316,7 +1317,8 @@ static int simplify_symbols(struct module *mod, const struct load_info *info) ksym = resolve_symbol_wait(mod, info, name); /* Ok if resolved. */ if (ksym && !IS_ERR(ksym)) { - sym[i].st_value = kernel_symbol_value(ksym); + unsigned long val = kernel_symbol_value(ksym); + bpf_arch_text_copy(&sym[i].st_value, &val, sizeof(val)); break; } @@ -1337,7 +1339,8 @@ static int simplify_symbols(struct module *mod, const struct load_info *info) secbase = (unsigned long)mod_percpu(mod); else secbase = info->sechdrs[sym[i].st_shndx].sh_addr; - sym[i].st_value += secbase; + secbase += sym[i].st_value; + bpf_arch_text_copy(&sym[i].st_value, &secbase, sizeof(secbase)); break; } } @@ -2118,7 +2121,7 @@ static int move_module(struct module *mod, struct load_info *info) void *ptr; /* Do the allocs. */ - ptr = module_alloc(mod->core_layout.size); + ptr = vmalloc_exec(mod->core_layout.size, PAGE_SIZE); /* * The pointer to this block is stored in the module structure * which is inside the block. Just mark it as not being a @@ -2128,7 +2131,7 @@ static int move_module(struct module *mod, struct load_info *info) if (!ptr) return -ENOMEM; - memset(ptr, 0, mod->core_layout.size); +/* memset(ptr, 0, mod->core_layout.size); */ mod->core_layout.base = ptr; if (mod->init_layout.size) { @@ -2141,7 +2144,7 @@ static int move_module(struct module *mod, struct load_info *info) */ kmemleak_ignore(ptr); if (!ptr) { - module_memfree(mod->core_layout.base); + vfree_exec(mod->core_layout.base); return -ENOMEM; } memset(ptr, 0, mod->init_layout.size); @@ -2151,7 +2154,7 @@ static int move_module(struct module *mod, struct load_info *info) #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC /* Do the allocs. */ - ptr = vmalloc(mod->data_layout.size); + ptr = module_alloc(mod->data_layout.size); /* * The pointer to this block is stored in the module structure * which is inside the block. Just mark it as not being a @@ -2159,7 +2162,7 @@ static int move_module(struct module *mod, struct load_info *info) */ kmemleak_not_leak(ptr); if (!ptr) { - module_memfree(mod->core_layout.base); + vfree_exec(mod->core_layout.base); module_memfree(mod->init_layout.base); return -ENOMEM; } @@ -2185,7 +2188,7 @@ static int move_module(struct module *mod, struct load_info *info) dest = mod->core_layout.base + shdr->sh_entsize; if (shdr->sh_type != SHT_NOBITS) - memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size); + bpf_arch_text_copy(dest, (void *)shdr->sh_addr, shdr->sh_size); /* Update sh_addr to point to copy in image. */ shdr->sh_addr = (unsigned long)dest; pr_debug("\t0x%lx %s\n", @@ -2341,7 +2344,7 @@ static void module_deallocate(struct module *mod, struct load_info *info) percpu_modfree(mod); module_arch_freeing_init(mod); module_memfree(mod->init_layout.base); - module_memfree(mod->core_layout.base); + vfree_exec(mod->core_layout.base); #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC vfree(mod->data_layout.base); #endif diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index 14fbea66f12f..d392eb7bf574 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -85,7 +85,6 @@ void module_enable_x(const struct module *mod) !PAGE_ALIGNED(mod->init_layout.base)) return; - frob_text(&mod->core_layout, set_memory_x); frob_text(&mod->init_layout, set_memory_x); } @@ -98,9 +97,7 @@ void module_enable_ro(const struct module *mod, bool after_init) return; #endif - set_vm_flush_reset_perms(mod->core_layout.base); set_vm_flush_reset_perms(mod->init_layout.base); - frob_text(&mod->core_layout, set_memory_ro); frob_rodata(&mod->data_layout, set_memory_ro); frob_text(&mod->init_layout, set_memory_ro); diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index bc921a3f7ea8..8cd31dc9ac84 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3177,6 +3177,7 @@ static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) if (mod) rec_flags |= FTRACE_FL_DISABLED; + ftrace_arch_code_modify_prepare(); for (pg = new_pgs; pg; pg = pg->next) { for (i = 0; i < pg->index; i++) { @@ -3198,7 +3199,7 @@ static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) update_cnt++; } } - + ftrace_arch_code_modify_post_process(); stop = ftrace_now(raw_smp_processor_id()); ftrace_update_time = stop - start; ftrace_update_tot_cnt += update_cnt; From patchwork Thu Aug 18 22:42:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12948142 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4344CC00140 for ; Thu, 18 Aug 2022 23:10:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96E518D0003; Thu, 18 Aug 2022 19:10:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F5D18D0002; Thu, 18 Aug 2022 19:10:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 796F28D0003; Thu, 18 Aug 2022 19:10:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 635F28D0002 for ; Thu, 18 Aug 2022 19:10:43 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 411051C2E04 for ; Thu, 18 Aug 2022 23:10:43 +0000 (UTC) X-FDA: 79814259966.10.823C933 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf05.hostedemail.com (Postfix) with ESMTP id E6BB81014F0 for ; Thu, 18 Aug 2022 22:54:19 +0000 (UTC) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27IKhkT1015187 for ; Thu, 18 Aug 2022 15:54:17 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j1k0p51jt-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 18 Aug 2022 15:54:16 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 18 Aug 2022 15:54:15 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 246D5BBF66DE; Thu, 18 Aug 2022 15:42:44 -0700 (PDT) From: Song Liu To: , CC: , , , , , , , , Song Liu Subject: [RFC 4/5] vmalloc_exec: share a huge page with kernel text Date: Thu, 18 Aug 2022 15:42:17 -0700 Message-ID: <20220818224218.2399791-5-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220818224218.2399791-1-song@kernel.org> References: <20220818224218.2399791-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: HdmwlR4yktwATB65AtTVFefP7Czhrdpu X-Proofpoint-GUID: HdmwlR4yktwATB65AtTVFefP7Czhrdpu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-18_16,2022-08-18_01,2022-06-22_01 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660863260; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RieGaWIGYAqVDoeh3TBoetqbaJ4MfEhZj2zMhc2yBZk=; b=wcUdRz8fX8vcjclCRUU0IVT12eqmWi4cZGkv+b9KACeyHPC/tUx6mJrON157xsHrhTI/V4 HaTZFl7LHKEJtYlBUj/TJUpddjOZF+PC5SEeu3pSLx/uZ540+WYQ54dIesJp+qJAj9Mg0S dn3S3wxS0zkRCeTC4bSvuuxPtG+lCok= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf05.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660863260; a=rsa-sha256; cv=none; b=xnwRnbb9GfZF8IivoAjvziiYDQv2MDxdtzLq71nvdW7Qlj6mvTVvuXgA/zdkOvAG+wis/r pAV8n50BM8XJHI2iAQDAJEBVssI0kwcGq+/45K4tMX7P0cASJ1VZ1gXxA1/ozfghoQcOAh 9rQex0v0z+e+wudmql5d9lKVXHr17nU= X-Stat-Signature: 71rd7z8j5tcxrq1mj9yeeu3jugrkiwg3 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E6BB81014F0 Authentication-Results: imf05.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf05.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" X-Rspam-User: X-HE-Tag: 1660863259-702225 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On x86 kernel, we allocate 2MB pages for kernel text up to round_down(_etext, 2MB). Therefore, some of the kernel text is still on 4kB pages. With vmalloc_exec, we can allocate 2MB pages up to round_up(_etext, 2MB), and use the rest of the page for modules and BPF programs. Here is an example: [root@eth50-1 ~]# grep _etext /proc/kallsyms ffffffff82202a08 T _etext [root@eth50-1 ~]# grep bpf_prog_ /proc/kallsyms | tail -n 3 ffffffff8220f920 t bpf_prog_cc61a5364ac11d93_handle__sched_wakeup [bpf] ffffffff8220fa28 t bpf_prog_cc61a5364ac11d93_handle__sched_wakeup_new [bpf] ffffffff8220fad4 t bpf_prog_3bf73fa16f5e3d92_handle__sched_switch [bpf] [root@eth50-1 ~]# grep 0xffffffff82200000 /sys/kernel/debug/page_tables/kernel 0xffffffff82200000-0xffffffff82400000 2M ro PSE x pmd [root@eth50-1 ~]# grep xfs_flush_inodes /proc/kallsyms ffffffff822ba910 t xfs_flush_inodes_worker [xfs] ffffffff822bc580 t xfs_flush_inodes [xfs] ffffffff82200000-ffffffff82400000 is a 2MB page, serving kernel text, xfs module, and bpf programs. --- arch/x86/mm/init_64.c | 3 ++- mm/vmalloc.c | 27 +++++++++++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 39c5246964a9..d27d0af5beb5 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1367,12 +1367,13 @@ int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) int kernel_set_to_readonly; +#define PMD_ALIGN(x) (((unsigned long)(x) + (PMD_SIZE - 1)) & PMD_MASK) void mark_rodata_ro(void) { unsigned long start = PFN_ALIGN(_text); unsigned long rodata_start = PFN_ALIGN(__start_rodata); unsigned long end = (unsigned long)__end_rodata_hpage_align; - unsigned long text_end = PFN_ALIGN(_etext); + unsigned long text_end = PMD_ALIGN(_etext); unsigned long rodata_end = PFN_ALIGN(__end_rodata); unsigned long all_end; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 472287e71bf1..5f3b5df9313f 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -72,6 +72,11 @@ early_param("nohugevmalloc", set_nohugevmalloc); static const bool vmap_allow_huge = false; #endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */ +#define PMD_ALIGN(x) (((unsigned long)(x) + (PMD_SIZE - 1)) & PMD_MASK) + +static struct vm_struct text_tail_vm; +static struct vmap_area text_tail_va; + bool is_vmalloc_addr(const void *x) { unsigned long addr = (unsigned long)kasan_reset_tag(x); @@ -634,6 +639,8 @@ int is_vmalloc_or_module_addr(const void *x) unsigned long addr = (unsigned long)kasan_reset_tag(x); if (addr >= MODULES_VADDR && addr < MODULES_END) return 1; + if (addr >= text_tail_va.va_start && addr < text_tail_va.va_end) + return 1; #endif return is_vmalloc_addr(x); } @@ -2371,6 +2378,25 @@ static void vmap_init_free_space(void) } } +static void register_text_tail_vm(void) +{ + unsigned long start = PFN_ALIGN(_etext); + unsigned long end = PMD_ALIGN(_etext); + struct vmap_area *va; + + va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); + if (WARN_ON_ONCE(!va)) + return; + text_tail_vm.addr = (void *)start; + text_tail_vm.size = end - start; + text_tail_vm.flags = VM_KERNEL_EXEC; + text_tail_va.va_start = start; + text_tail_va.va_end = end; + text_tail_va.vm = &text_tail_vm; + memcpy(va, &text_tail_va, sizeof(*va)); + insert_vmap_area(va, &free_text_area_root, &free_text_area_list); +} + void __init vmalloc_init(void) { struct vmap_area *va; @@ -2381,6 +2407,7 @@ void __init vmalloc_init(void) * Create the cache for vmap_area objects. */ vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC); + register_text_tail_vm(); for_each_possible_cpu(i) { struct vmap_block_queue *vbq; From patchwork Thu Aug 18 22:42:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12948193 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E26FCC00140 for ; Thu, 18 Aug 2022 23:49:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7599D8D0002; Thu, 18 Aug 2022 19:49:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 708478D0001; Thu, 18 Aug 2022 19:49:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D15F8D0002; Thu, 18 Aug 2022 19:49:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 401A38D0001 for ; Thu, 18 Aug 2022 19:49:34 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F2E7B1A0152 for ; Thu, 18 Aug 2022 23:49:33 +0000 (UTC) X-FDA: 79814357868.21.2C3DC8E Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf31.hostedemail.com (Postfix) with ESMTP id 3118321437 for ; Thu, 18 Aug 2022 23:21:17 +0000 (UTC) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27IKhi4c016409 for ; Thu, 18 Aug 2022 15:54:20 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j1d1dpr6u-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 18 Aug 2022 15:54:19 -0700 Received: from twshared7570.37.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 18 Aug 2022 15:54:13 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 28C05BBF66E3; Thu, 18 Aug 2022 15:42:46 -0700 (PDT) From: Song Liu To: , CC: , , , , , , , , Song Liu Subject: [RFC 5/5] vmalloc: vfree_exec: free unused vm_struct Date: Thu, 18 Aug 2022 15:42:18 -0700 Message-ID: <20220818224218.2399791-6-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220818224218.2399791-1-song@kernel.org> References: <20220818224218.2399791-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: pKyqVr9L__8qhxUToW_taDNxsTojsbZq X-Proofpoint-GUID: pKyqVr9L__8qhxUToW_taDNxsTojsbZq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-18_16,2022-08-18_01,2022-06-22_01 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660864879; a=rsa-sha256; cv=none; b=bcFm8PrkzGo9Zwl7T/FNMZlf3sHDLKYypAzKLmDtawerhlUUfa/vKEIiMWt69mT2U60A8w ya9cdl6IbfIBMr0F0D0zuDI0UwTIYWb8eDZ/8uoeBEgEPUE3vU5C09qKk81A95/02BbEwU +LrBdtkGt/+X2vJjZvYEKLX/Fa0wCso= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf31.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660864879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cfxk5IpxpBr5oFfQgYxTLNpH3zMv0ldk1OTQ1M6swNU=; b=HeuJQJfVdDK2xw/GP5QZFpTziF6Mqyx+3GTdQO3OwBkteZ2w0y2nYoOwTbrsrARONorx06 HVDPzmLxRAJoBb762JDasEfBmQQ8ieB4Jp7a/IdeAfy7gMzpGfdD6zzJJMbf79G/PkarMh MwldIo2X4A6N1wnhVNLq+jKInH1mjg4= X-Stat-Signature: 65jmdf89fzs6xq1dtpy9974r8eqa74xu X-Rspamd-Queue-Id: 3118321437 Authentication-Results: imf31.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf31.hostedemail.com: domain of "prvs=9229e92ca3=songliubraving@fb.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=9229e92ca3=songliubraving@fb.com" X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1660864877-998968 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is clearly not done yet, but it won't be too hard. I would like to highlight that, we need both subtree_max_size and vm for vmap_area in free_text tree. Therefore, we cannot keep the union in vmam_area. --- include/linux/vmalloc.h | 12 ++---------- mm/vmalloc.c | 14 ++++++++++++-- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 691c02ffe3db..de7731caadc0 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -68,16 +68,8 @@ struct vmap_area { struct rb_node rb_node; /* address sorted rbtree */ struct list_head list; /* address sorted list */ - /* - * The following two variables can be packed, because - * a vmap_area object can be either: - * 1) in "free" tree (root is free_vmap_area_root) - * 2) or "busy" tree (root is vmap_area_root) - */ - union { - unsigned long subtree_max_size; /* in "free" tree */ - struct vm_struct *vm; /* in "busy" tree */ - }; + unsigned long subtree_max_size; + struct vm_struct *vm; }; /* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5f3b5df9313f..57dd18882d37 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1428,6 +1428,7 @@ adjust_va_to_fit_type(struct rb_root *root, struct list_head *head, */ lva->va_start = va->va_start; lva->va_end = nva_start_addr; + lva->vm = va->vm; /* * Shrink this VA to remaining size. @@ -3394,10 +3395,19 @@ void vfree_exec(const void *addr) spin_unlock(&vmap_area_lock); spin_lock(&free_text_area_lock); - merge_or_add_vmap_area_augment(va, + va = merge_or_add_vmap_area_augment(va, &free_text_area_root, &free_text_area_list); + if (va) { + struct vm_struct *vm = va->vm; + + if (vm != &text_tail_vm) { + va = __find_vmap_area((unsigned long)vm->addr, + free_text_area_root.rb_node); + if (va->va_start == (unsigned long)vm->addr) + pr_info("%s TODO: free vm->addr %px\n", __func__, vm->addr); + } + } spin_unlock(&free_text_area_lock); - /* TODO: when the whole vm_struct is not in use, free it */ } /**