From patchwork Fri Oct 7 23:43:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 13001541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57A48C433FE for ; Fri, 7 Oct 2022 23:46:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70DAA6B0072; Fri, 7 Oct 2022 19:46:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6962C6B0073; Fri, 7 Oct 2022 19:46:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E9976B0074; Fri, 7 Oct 2022 19:46:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3484B6B0072 for ; Fri, 7 Oct 2022 19:46:09 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E53EF120237 for ; Fri, 7 Oct 2022 23:46:08 +0000 (UTC) X-FDA: 79995789216.19.F2FEEE8 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf14.hostedemail.com (Postfix) with ESMTP id 8B455100011 for ; Fri, 7 Oct 2022 23:46:07 +0000 (UTC) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 297NYiZP012650 for ; Fri, 7 Oct 2022 16:46:06 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3k2acdfa6t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 07 Oct 2022 16:46:06 -0700 Received: from twshared19720.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 7 Oct 2022 16:46:05 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 425F1DE44324; Fri, 7 Oct 2022 16:43:28 -0700 (PDT) From: Song Liu To: , CC: , , , , , , , , Song Liu Subject: [RFC v2 4/4] vmalloc_exec: share a huge page with kernel text Date: Fri, 7 Oct 2022 16:43:15 -0700 Message-ID: <20221007234315.2877365-5-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221007234315.2877365-1-song@kernel.org> References: <20221007234315.2877365-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: p3vvf65NT4dX1HS36XzS-krMcEfN2nDB X-Proofpoint-GUID: p3vvf65NT4dX1HS36XzS-krMcEfN2nDB X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-10-07_04,2022-10-07_01,2022-06-22_01 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665186367; a=rsa-sha256; cv=none; b=hM7jLGeHG37dlZEHIGdW3AzqmunkNUrUdnJIwbq9sW+jvPzIF8a7C0zYQOvYVKIP/JkldD tGhiMw4r2ES1GXD1mB0PwBbY96hSDTLXrViNbJWZN8wkrLYTV7SeFeVSIVZj/YuOyAeXcP VCFYsq6HeWfddkKr16dXsCqhuSdouOk= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf14.hostedemail.com: domain of "prvs=1279faf8c2=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=1279faf8c2=songliubraving@fb.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665186367; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Av+fHZMPINneoQ5H98o6FwkPq64MNwSAeKLIHKvYfaU=; b=3v57Goe1sb4nI4Obs2x9KynYaRN7XM27Q/bPrkeKgxKA+7cNa8R4s7yKvNJ/6TFHCpXlnS I81cA2lItTPXo/rtARrDd9be2v4Njx220GpHqM3oQ7/iHqYOFUXwxebUK/PsEQXwb+1GpW iPCeGe+xuk7lnJo3kDBPINs1zQ0Uqcs= X-Stat-Signature: usgrwhiw3ahzdfkiea6zyk8dmznikiih X-Rspamd-Queue-Id: 8B455100011 Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none); spf=pass (imf14.hostedemail.com: domain of "prvs=1279faf8c2=songliubraving@fb.com" designates 67.231.153.30 as permitted sender) smtp.mailfrom="prvs=1279faf8c2=songliubraving@fb.com" X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1665186367-603062 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On x86 kernel, we allocate 2MB pages for kernel text up to round_down(_etext, 2MB). Therefore, some of the kernel text is still on 4kB pages. With vmalloc_exec, we can allocate 2MB pages up to round_up(_etext, 2MB), and use the rest of the page for modules and BPF programs. Here is an example: [root@eth50-1 ~]# grep _etext /proc/kallsyms ffffffff82202a08 T _etext [root@eth50-1 ~]# grep bpf_prog_ /proc/kallsyms | tail -n 3 ffffffff8220f920 t bpf_prog_cc61a5364ac11d93_handle__sched_wakeup [bpf] ffffffff8220fa28 t bpf_prog_cc61a5364ac11d93_handle__sched_wakeup_new [bpf] ffffffff8220fad4 t bpf_prog_3bf73fa16f5e3d92_handle__sched_switch [bpf] [root@eth50-1 ~]# grep 0xffffffff82200000 /sys/kernel/debug/page_tables/kernel 0xffffffff82200000-0xffffffff82400000 2M ro PSE x pmd [root@eth50-1 ~]# grep xfs_flush_inodes /proc/kallsyms ffffffff822ba910 t xfs_flush_inodes_worker [xfs] ffffffff822bc580 t xfs_flush_inodes [xfs] ffffffff82200000-ffffffff82400000 is a 2MB page, serving kernel text, xfs module, and bpf programs. Signed-off-by: Song Liu --- arch/x86/mm/init_64.c | 3 ++- mm/vmalloc.c | 24 ++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 0fe690ebc269..d94f196c541a 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1367,12 +1367,13 @@ int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) int kernel_set_to_readonly; +#define PMD_ALIGN(x) (((unsigned long)(x) + (PMD_SIZE - 1)) & PMD_MASK) void mark_rodata_ro(void) { unsigned long start = PFN_ALIGN(_text); unsigned long rodata_start = PFN_ALIGN(__start_rodata); unsigned long end = (unsigned long)__end_rodata_hpage_align; - unsigned long text_end = PFN_ALIGN(_etext); + unsigned long text_end = PMD_ALIGN(_etext); unsigned long rodata_end = PFN_ALIGN(__end_rodata); unsigned long all_end; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 9212ff96b871..41509bbec583 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -75,6 +75,9 @@ static const bool vmap_allow_huge = false; #define PMD_ALIGN(addr) ALIGN(addr, PMD_SIZE) #define PMD_ALIGN_DOWN(addr) ALIGN_DOWN(addr, PMD_SIZE) +static struct vm_struct text_tail_vm; +static struct vmap_area text_tail_va; + bool is_vmalloc_addr(const void *x) { unsigned long addr = (unsigned long)kasan_reset_tag(x); @@ -637,6 +640,8 @@ int is_vmalloc_or_module_addr(const void *x) unsigned long addr = (unsigned long)kasan_reset_tag(x); if (addr >= MODULES_VADDR && addr < MODULES_END) return 1; + if (addr >= text_tail_va.va_start && addr < text_tail_va.va_end) + return 1; #endif return is_vmalloc_addr(x); } @@ -2422,6 +2427,24 @@ static void vmap_init_free_space(void) } } +static void register_text_tail_vm(void) +{ + unsigned long start = PFN_ALIGN((unsigned long)_etext); + unsigned long end = PMD_ALIGN((unsigned long)_etext); + struct vmap_area *va; + + va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT); + if (WARN_ON_ONCE(!va)) + return; + text_tail_vm.addr = (void *)start; + text_tail_vm.size = end - start; + text_tail_va.va_start = start; + text_tail_va.va_end = end; + text_tail_va.vm = &text_tail_vm; + memcpy(va, &text_tail_va, sizeof(*va)); + insert_vmap_area_augment(va, NULL, &free_text_area_root, &free_text_area_list); +} + void __init vmalloc_init(void) { struct vmap_area *va; @@ -2432,6 +2455,7 @@ void __init vmalloc_init(void) * Create the cache for vmap_area objects. */ vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC); + register_text_tail_vm(); for_each_possible_cpu(i) { struct vmap_block_queue *vbq;