From patchwork Mon Aug 8 14:56:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 12938859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D865CC25B0C for ; Mon, 8 Aug 2022 14:57:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B9366B0071; Mon, 8 Aug 2022 10:57:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4405C8E0002; Mon, 8 Aug 2022 10:57:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3076A6B0073; Mon, 8 Aug 2022 10:57:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 20A436B0071 for ; Mon, 8 Aug 2022 10:57:36 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E5DB0A0F98 for ; Mon, 8 Aug 2022 14:57:35 +0000 (UTC) X-FDA: 79776729270.12.732F35A Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf01.hostedemail.com (Postfix) with ESMTP id 22D9440178 for ; Mon, 8 Aug 2022 14:57:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659970655; x=1691506655; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RqWLLHXdKq7i3jEZDCWPo2WblS5wC3PZ6h5tyOrVLyQ=; b=VM1r39Lszl3PX0aG6E+nDZTFpxslCC4feklX8Ja+vDq6IeR+tcPMm87O XuUlLEApx7gFrBeuJgDDpywm91NqfuZfuCXRkTvZbASLG15I9WkQev7Fw /148Kmh2hRYZ4OmedFleoYkFYoT/hkdKabzx1KWF/mxCK6JiRXg4L6jpJ vcSJ7gQsOrD7qhS2w2BugiRrJgnefCOamqlcgmQGqmQlaeZl2hO3NJ5Op S94PNBnchNPgtHdgjpiQPWYxkIXyRfBBgUNgBuuYdkK9ePRBhq+E9dIWY nfh8UTJkXGWe51y/Pv9U4R/RwKGIooObk4lqxonrDg3lXy5NG8qYglpmm Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10433"; a="376904739" X-IronPort-AV: E=Sophos;i="5.93,222,1654585200"; d="scan'208";a="376904739" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 07:57:34 -0700 X-IronPort-AV: E=Sophos;i="5.93,222,1654585200"; d="scan'208";a="663980492" Received: from ziqianlu-desk2.sh.intel.com ([10.238.2.76]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 07:57:32 -0700 From: Aaron Lu To: Dave Hansen , Rick Edgecombe Cc: Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH 1/4] x86/mm/cpa: restore global bit when page is present Date: Mon, 8 Aug 2022 22:56:46 +0800 Message-Id: <20220808145649.2261258-2-aaron.lu@intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220808145649.2261258-1-aaron.lu@intel.com> References: <20220808145649.2261258-1-aaron.lu@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659970655; a=rsa-sha256; cv=none; b=LBuKShaMS3jY/Vvy9nvtDQ2m5w6CD59bEx0eBQEpAMTl+D6DIIcOE8B9GSmjt83cZDL5kl 4sJ/RGQZP6MjAdH5fQfZ0E02hSy2nCfilXfgdi9HDDo88EHVM+Dn5OKRNHn6Gm0U5yjHDo Am8D8xGuVlpd1ZyZ3UNzfwYZgI9Y6CE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659970655; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TzMGC4+/VGSo67bF0WlkxoKjHQ2K9FAGMqQM0siIVvM=; b=p3bXdG1d+jZy8mbC8EO/uIgTGpPetZUfyvbm0uos47Hkr6F00RVCKq2eZAokAeaqqZJC04 l/kVp9G9CWTDQsIgxRm7ZTBec39V/PDyyKn86jVBzSuF1TlzoBztC855nkj+p4ebulrRET W70710Jx3yWOnpbBfQXLiZtsEZzq4bY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=VM1r39Ls; spf=pass (imf01.hostedemail.com: domain of aaron.lu@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: y6zsmqx5yxywe7p3hraex1yxgcxejsj3 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 22D9440178 Authentication-Results: imf01.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=VM1r39Ls; spf=pass (imf01.hostedemail.com: domain of aaron.lu@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1659970654-13594 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For configs that don't have PTI enabled or cpus that don't need meltdown mitigation, current kernel can lose GLOBAL bit after a page goes through a cycle of present -> not present -> present. It happened like this(__vunmap() does this in vm_remove_mappings()): original page protection: 0x8000000000000163 (NX/G/D/A/RW/P) set_memory_np(page, 1): 0x8000000000000062 (NX/D/A/RW) lose G and P set_memory_p(pagem 1): 0x8000000000000063 (NX/D/A/RW/P) restored P In the end, this page's protection no longer has Global bit set and this would create problem for this merge small mapping feature. For this reason, restore Global bit for systems that do not have PTI enabled if page is present. (pgprot_clear_protnone_bits() deserves a better name if this patch is acceptible but first, I would like to get some feedback if this is the right way to solve this so I didn't bother with the name yet) Signed-off-by: Aaron Lu --- arch/x86/mm/pat/set_memory.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 1abd5438f126..33657a54670a 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot) */ if (!(pgprot_val(prot) & _PAGE_PRESENT)) pgprot_val(prot) &= ~_PAGE_GLOBAL; + else + pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask; return prot; } From patchwork Mon Aug 8 14:56:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 12938860 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D60E8C00140 for ; Mon, 8 Aug 2022 14:57:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DDD08E0003; Mon, 8 Aug 2022 10:57:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 48E148E0002; Mon, 8 Aug 2022 10:57:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E0218E0003; Mon, 8 Aug 2022 10:57:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1D61E8E0002 for ; Mon, 8 Aug 2022 10:57:40 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E40921C1B31 for ; Mon, 8 Aug 2022 14:57:39 +0000 (UTC) X-FDA: 79776729438.09.E8DAFB7 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf09.hostedemail.com (Postfix) with ESMTP id 3EB4614016D for ; Mon, 8 Aug 2022 14:57:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659970659; x=1691506659; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NvohfDahb9Ga60b7Ddo8K4Xaj4ChBuprZ05/+jYnTA0=; b=bQbjpGWmak05dmqTOP14EYMSKMsf6pfBic1GPatt9vmbmyjr/nOoDAu0 F4tKeLNw7WZLKz92EoJtqMKS4JxoJ9dLPzJms4WWJIoT70Ia7FpyzWCRU qeCT0pKchVopZ1ap4/Y5dmdMYU0dbEYlmAPUnSbZKgqmROPfT3qy6DnJE d2xFu1aRSp0wUGATPEuFDDPRbMbA6ydzyMhifHc5EZvfGp/d5TXMN+cNV BDA4PmHehf8GX+aAuGti2WXs7dmsaWxsLxqZ20iQUF4r4V2bEJ40JsCiN CQ/1aQeQpnTxHR4aeWU/kXbpI62bpNugRCSaxdHwQ9ryjKOrsjaJ4S4X+ A==; X-IronPort-AV: E=McAfee;i="6400,9594,10433"; a="376904744" X-IronPort-AV: E=Sophos;i="5.93,222,1654585200"; d="scan'208";a="376904744" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 07:57:36 -0700 X-IronPort-AV: E=Sophos;i="5.93,222,1654585200"; d="scan'208";a="663980501" Received: from ziqianlu-desk2.sh.intel.com ([10.238.2.76]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 07:57:34 -0700 From: Aaron Lu To: Dave Hansen , Rick Edgecombe Cc: Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH 2/4] x86/mm/cpa: merge splitted direct mapping when possible Date: Mon, 8 Aug 2022 22:56:47 +0800 Message-Id: <20220808145649.2261258-3-aaron.lu@intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220808145649.2261258-1-aaron.lu@intel.com> References: <20220808145649.2261258-1-aaron.lu@intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659970659; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uzUOyTnVl96Gsm6FJBKnlNnzSdUtOYIQIoOED6kG2ss=; b=1PhJaEuCu1KASLG9oA+9uhK7zhDcJSntUT7AIXG2Fa+3SXxS4kSPgE7UWjRoAIUuUQrB7D oMjnWzlyABW3tZ4/meWu/N4tWzwzw2oopPGpGIXvUhbBGjluYMeODc4ubIsurE11JFpBzH pPA6CGI6QSOtf9Dcmb+etO2v21AJ2gk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=bQbjpGWm; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf09.hostedemail.com: domain of aaron.lu@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=aaron.lu@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659970659; a=rsa-sha256; cv=none; b=teJe+oTTA6UUmE0g1GdLuVWDpLUjminANkdKFKYcfMbG6flAc2+097zQR4gQaIDbKfQiIh DoBDHJmR1+syFnLt170iF/0l072PQ5K07rM++ou50Ly0S99m6evxkIQwbBskw/DmPUQDqm 7XXPAGVZzajW4qHfTg1rXh5H9w+JeAE= Authentication-Results: imf09.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=bQbjpGWm; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf09.hostedemail.com: domain of aaron.lu@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=aaron.lu@intel.com X-Rspamd-Server: rspam02 X-Stat-Signature: cw4u53j7uwe4gckpt5weyu6i5j7kud7b X-Rspamd-Queue-Id: 3EB4614016D X-Rspam-User: X-HE-Tag: 1659970658-913031 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On x86_64, Linux has direct mapping of almost all physical memory. For performance reasons, this mapping is usually set as large page like 2M or 1G per hardware's capability with read, write and non-execute protection. There are cases where some pages have to change their protection to RO and eXecutable, like pages that host module code or bpf prog. When these pages' protection are changed, the corresponding large mapping that cover these pages will have to be splitted into 4K first and then individual 4k page's protection changed accordingly, i.e. unaffected pages keep their original protection as RW and NX while affected pages' protection changed to RO and X. There is a problem due to this split: the large mapping will remain splitted even after the affected pages' protection are changed back to RW and NX, like when the module is unloaded or bpf progs are freed. After system runs a long time, there can be more and more large mapping being splitted, causing more and more dTLB misses and overall system performance getting hurt. This patch tries to restore splitted large mapping by tracking how many entries of the splitted small mapping page table have the same protection bits and once that number becomes PTRS_PER_PTE, this small mapping page table can be released with its upper level page table entry pointing directly to a large page. Testing: see patch4 for detailed testing. Signed-off-by: Aaron Lu --- arch/x86/mm/pat/set_memory.c | 184 +++++++++++++++++++++++++++++++++-- include/linux/mm_types.h | 6 ++ include/linux/page-flags.h | 6 ++ 3 files changed, 189 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 33657a54670a..fea2c70ff37f 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -718,13 +718,89 @@ phys_addr_t slow_virt_to_phys(void *__virt_addr) } EXPORT_SYMBOL_GPL(slow_virt_to_phys); +static void merge_splitted_mapping(struct page *pgt, int level); +static void set_pte_adjust_nr_same_prot(pte_t *kpte, int level, pte_t pte) +{ + struct page *pgt = virt_to_page(kpte); + pgprot_t old_prot, new_prot; + int i; + + /* The purpose of tracking entries with same_prot is to hopefully + * mege splitted small mappings to large ones. Since only 2M and + * 1G mapping are supported, there is no need tracking for page + * tables of level > 2M. + */ + if (!PageSplitpgt(pgt) || level > PG_LEVEL_2M) { + set_pte(kpte, pte); + return; + } + + /* get old protection before kpte is updated */ + if (level == PG_LEVEL_4K) { + old_prot = pte_pgprot(*kpte); + new_prot = pte_pgprot(pte); + } else { + old_prot = pmd_pgprot(*(pmd_t *)kpte); + new_prot = pmd_pgprot(*(pmd_t *)&pte); + } + + set_pte(kpte, pte); + + if (pgprot_val(pgt->same_prot) != pgprot_val(old_prot) && + pgprot_val(pgt->same_prot) == pgprot_val(new_prot)) + pgt->nr_same_prot++; + + if (pgprot_val(pgt->same_prot) == pgprot_val(old_prot) && + pgprot_val(pgt->same_prot) != pgprot_val(new_prot)) + pgt->nr_same_prot--; + + if (unlikely(pgt->nr_same_prot == 0)) { + pte_t *entry = page_address(pgt); + + /* + * Now all entries' prot have changed, check again + * to see if all entries have the same new prot. + * Use the 1st entry's prot as the new pgt->same_prot. + */ + if (level == PG_LEVEL_4K) + pgt->same_prot = pte_pgprot(*entry); + else + pgt->same_prot = pmd_pgprot(*(pmd_t *)entry); + + for (i = 0; i < PTRS_PER_PTE; i++, entry++) { + pgprot_t prot; + + if (level == PG_LEVEL_4K) + prot = pte_pgprot(*entry); + else + prot = pmd_pgprot(*(pmd_t *)entry); + + if (pgprot_val(prot) == pgprot_val(pgt->same_prot)) + pgt->nr_same_prot++; + } + } + + /* + * If this splitted page table's entries all have the same + * protection now, try merge it. Note that for a PMD level + * page table, if all entries are pointing to PTE page table, + * no merge can be done. + */ + if (unlikely(pgt->nr_same_prot == PTRS_PER_PTE && + (pgprot_val(pgt->same_prot) & _PAGE_PRESENT) && + (level == PG_LEVEL_4K || + pgprot_val(pgt->same_prot) & _PAGE_PSE))) + merge_splitted_mapping(pgt, level); + +} + /* * Set the new pmd in all the pgds we know about: */ -static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) +static void __set_pmd_pte(pte_t *kpte, int level, unsigned long address, pte_t pte) { /* change init_mm */ - set_pte_atomic(kpte, pte); + set_pte_adjust_nr_same_prot(kpte, level, pte); #ifdef CONFIG_X86_32 if (!SHARED_KERNEL_PMD) { struct page *page; @@ -739,12 +815,68 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) p4d = p4d_offset(pgd, address); pud = pud_offset(p4d, address); pmd = pmd_offset(pud, address); - set_pte_atomic((pte_t *)pmd, pte); + set_pte_adjust_nr_same_prot((pte_t *)pmd, level, pte); } } #endif } +static void merge_splitted_mapping(struct page *pgt, int level) +{ + pte_t *kpte = page_address(pgt); + pgprot_t pte_prot, pmd_prot; + unsigned long address; + unsigned long pfn; + pte_t pte; + pud_t pud; + + switch (level) { + case PG_LEVEL_4K: + pte_prot = pte_pgprot(*kpte); + pmd_prot = pgprot_4k_2_large(pte_prot); + pgprot_val(pmd_prot) |= _PAGE_PSE; + pfn = pte_pfn(*kpte); + pte = pfn_pte(pfn, pmd_prot); + + /* + * update upper level kpte. + * Note that further merge can happen if all PMD table's + * entries have the same protection bits after this change. + */ + address = (unsigned long)page_address(pfn_to_page(pfn)); + __set_pmd_pte(pgt->upper_kpte, level + 1, address, pte); + break; + case PG_LEVEL_2M: + pfn = pmd_pfn(*(pmd_t *)kpte); + pmd_prot = pmd_pgprot(*(pmd_t *)kpte); + pud = pfn_pud(pfn, pmd_prot); + set_pud(pgt->upper_kpte, pud); + break; + default: + WARN_ON_ONCE(1); + return; + } + + /* + * Current kernel did flush_tlb_all() when splitting a large page + * inside pgd_lock because: + * - an errata of Atom AAH41; as well as + * - avoid another cpu simultaneously changing the just splitted + * large page's attr. + * The first does not require a full tlb flush according to + * commit 211b3d03c7400("x86: work around Fedora-11 x86-32 kernel + * failures on Intel Atom CPUs") while the 2nd can be already + * achieved by cpa_lock. commit c0a759abf5a68("x86/mm/cpa: Move + * flush_tlb_all()") simplified the code by doing a full tlb flush + * inside pgd_lock. For the same reason, I also did a full tlb + * flush inside pgd_lock after doing a merge. + */ + flush_tlb_all(); + + __ClearPageSplitpgt(pgt); + __free_page(pgt); +} + static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot) { /* @@ -901,9 +1033,10 @@ static int __should_split_large_page(pte_t *kpte, unsigned long address, /* All checks passed. Update the large page mapping. */ new_pte = pfn_pte(old_pfn, new_prot); - __set_pmd_pte(kpte, address, new_pte); + __set_pmd_pte(kpte, level, address, new_pte); cpa->flags |= CPA_FLUSHTLB; cpa_inc_lp_preserved(level); + return 0; } @@ -1023,6 +1156,11 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address, for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc, lpaddr += lpinc) split_set_pte(cpa, pbase + i, pfn, ref_prot, lpaddr, lpinc); + __SetPageSplitpgt(base); + base->upper_kpte = kpte; + base->same_prot = ref_prot; + base->nr_same_prot = PTRS_PER_PTE; + if (virt_addr_valid(address)) { unsigned long pfn = PFN_DOWN(__pa(address)); @@ -1037,7 +1175,7 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address, * pagetable protections, the actual ptes set above control the * primary protection behavior: */ - __set_pmd_pte(kpte, address, mk_pte(base, __pgprot(_KERNPG_TABLE))); + __set_pmd_pte(kpte, level, address, mk_pte(base, __pgprot(_KERNPG_TABLE))); /* * Do a global flush tlb after splitting the large page @@ -1508,6 +1646,23 @@ static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr, } } +/* + * When debug_pagealloc_enabled(): + * - direct map will not use large page mapping; + * - kernel highmap can still use large mapping. + * When !debug_pagealloc_enabled(): both direct map and kernel highmap + * can use large page mapping. + * + * When large page mapping is used, it can be splitted due to reasons + * like protection change and thus, it is also possible a merge can + * happen for that splitted small mapping page table page. + */ +static bool subject_to_merge(unsigned long addr) +{ + return !debug_pagealloc_enabled() || + within(addr, (unsigned long)_text, _brk_end); +} + static int __change_page_attr(struct cpa_data *cpa, int primary) { unsigned long address; @@ -1526,10 +1681,23 @@ static int __change_page_attr(struct cpa_data *cpa, int primary) return __cpa_process_fault(cpa, address, primary); if (level == PG_LEVEL_4K) { - pte_t new_pte; + pte_t new_pte, *tmp; pgprot_t new_prot = pte_pgprot(old_pte); unsigned long pfn = pte_pfn(old_pte); + if (subject_to_merge(address)) { + spin_lock(&pgd_lock); + /* + * Check for races, another CPU might have merged + * this page up already. + */ + tmp = _lookup_address_cpa(cpa, address, &level); + if (tmp != kpte) { + spin_unlock(&pgd_lock); + goto repeat; + } + } + pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr); pgprot_val(new_prot) |= pgprot_val(cpa->mask_set); @@ -1551,10 +1719,12 @@ static int __change_page_attr(struct cpa_data *cpa, int primary) * Do we really change anything ? */ if (pte_val(old_pte) != pte_val(new_pte)) { - set_pte_atomic(kpte, new_pte); + set_pte_adjust_nr_same_prot(kpte, level, new_pte); cpa->flags |= CPA_FLUSHTLB; } cpa->numpages = 1; + if (subject_to_merge(address)) + spin_unlock(&pgd_lock); return 0; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c29ab4c0cd5c..6124c575fdad 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -160,6 +160,12 @@ struct page { spinlock_t ptl; #endif }; + struct { /* splitted page table pages */ + void *upper_kpte; /* compound_head */ + int nr_same_prot; + unsigned long _split_pt_pad; /* mapping */ + pgprot_t same_prot; + }; struct { /* ZONE_DEVICE pages */ /** @pgmap: Points to the hosting device page map. */ struct dev_pagemap *pgmap; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e66f7aa3191d..3fe395dd7dfc 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -942,6 +942,7 @@ static inline bool is_page_hwpoison(struct page *page) #define PG_offline 0x00000100 #define PG_table 0x00000200 #define PG_guard 0x00000400 +#define PG_splitpgt 0x00000800 #define PageType(page, flag) \ ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE) @@ -1012,6 +1013,11 @@ PAGE_TYPE_OPS(Table, table) */ PAGE_TYPE_OPS(Guard, guard) +/* + * Marks pages in use as splitted page tables + */ +PAGE_TYPE_OPS(Splitpgt, splitpgt) + extern bool is_free_buddy_page(struct page *page); PAGEFLAG(Isolated, isolated, PF_ANY); From patchwork Mon Aug 8 14:56:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 12938861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6E36C25B08 for ; Mon, 8 Aug 2022 14:57:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 247C08E0005; Mon, 8 Aug 2022 10:57:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F5FD8E0002; Mon, 8 Aug 2022 10:57:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 048FB8E0005; Mon, 8 Aug 2022 10:57:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E8BE98E0002 for ; Mon, 8 Aug 2022 10:57:40 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B4A9480F2A for ; Mon, 8 Aug 2022 14:57:40 +0000 (UTC) X-FDA: 79776729480.27.BD2B34A Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf11.hostedemail.com (Postfix) with ESMTP id D5B1740189 for ; Mon, 8 Aug 2022 14:57:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659970659; x=1691506659; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZZZmTi+uFgSNOrvmXUlZxD/TTF+6bac/va5jAJ+uc90=; b=DVdxxITXj3at7eq+x4iGGMATfTM3YrUsN4SMAf2wATVRyUHfTDWj4bGJ BiU323LtY487ERYBc6Xe2thQGYAjrHkLjYF0OAm8Gasnel7G78UDDjIf2 fqF73nEEyBAE+GK13AknsesHV3U7VRfpraIB25xDLc4R+jNC/tCkIDZDD ctsLoX82SGj+jrZiZRU/Vx8/ZE2iqNUPtdXp9OYfkPzMNcAfKmpK6H692 5huf+owRzSOcYhckUWhYpNDJvhm/J35DvLrcgQxgXidWk7DGxgq6DOlDR 1CnvmlteYXFyKZuUgyspS0BtxMl2F3E9ft7x/SAsK9u98CV85MkWuQjTm Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10433"; a="289369235" X-IronPort-AV: E=Sophos;i="5.93,222,1654585200"; d="scan'208";a="289369235" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 07:57:38 -0700 X-IronPort-AV: E=Sophos;i="5.93,222,1654585200"; d="scan'208";a="663980513" Received: from ziqianlu-desk2.sh.intel.com ([10.238.2.76]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 07:57:36 -0700 From: Aaron Lu To: Dave Hansen , Rick Edgecombe Cc: Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH 3/4] x86/mm/cpa: add merge event counter Date: Mon, 8 Aug 2022 22:56:48 +0800 Message-Id: <20220808145649.2261258-4-aaron.lu@intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220808145649.2261258-1-aaron.lu@intel.com> References: <20220808145649.2261258-1-aaron.lu@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659970660; a=rsa-sha256; cv=none; b=fR7WMlxPusdpwKrR7OutXmsR95OfXJUfV5riOOo/rEL0KZSn14+ajEo8364iSlwFWiOcNW 3bkGwrebIvBiadNFWRPnVqTmWyCmvJc4WWgqf+T7FiCcpKcFBb3JzaWJIBoZ89cKr8jjUm 8Poty9Sc1lhKdN31eiozuSX5Xtxa5RY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=DVdxxITX; spf=pass (imf11.hostedemail.com: domain of aaron.lu@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659970660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4sjYT3O5jxo/Il99ccyLPXTojDjaSm017zCPZ9M62xc=; b=O/HZHv16B4t/Sf3/2GhUZxiOpL5n+fB4ElH10xCgkRKeW1yFlJ/U0g7PtQTj2JHm3yLtYJ AjuBEhsbkMtZjMf21AmXBZcaDb2X1n8Qv0qSRfmo636WDafXPBwsk9m1XQ5GkLphhD+Wqx Wb0j5qbO9TLGHcUn+jaCuaVW+eGRokc= X-Rspamd-Queue-Id: D5B1740189 X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=DVdxxITX; spf=pass (imf11.hostedemail.com: domain of aaron.lu@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: ri5q3u8zayds6zc9fg7pmz79rzywdnin X-HE-Tag: 1659970659-918439 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Like split event counter, this patch add counter for merge event. Signed-off-by: Aaron Lu --- arch/x86/mm/pat/set_memory.c | 19 +++++++++++++++++++ include/linux/vm_event_item.h | 2 ++ mm/vmstat.c | 2 ++ 3 files changed, 23 insertions(+) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index fea2c70ff37f..1be9aab42c79 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -105,6 +105,23 @@ static void split_page_count(int level) direct_pages_count[level - 1] += PTRS_PER_PTE; } +static void merge_page_count(int level) +{ + if (direct_pages_count[level] < PTRS_PER_PTE) { + WARN_ON_ONCE(1); + return; + } + + direct_pages_count[level] -= PTRS_PER_PTE; + if (system_state == SYSTEM_RUNNING) { + if (level == PG_LEVEL_4K) + count_vm_event(DIRECT_MAP_LEVEL1_MERGE); + else if (level == PG_LEVEL_2M) + count_vm_event(DIRECT_MAP_LEVEL2_MERGE); + } + direct_pages_count[level + 1]++; +} + void arch_report_meminfo(struct seq_file *m) { seq_printf(m, "DirectMap4k: %8lu kB\n", @@ -875,6 +892,8 @@ static void merge_splitted_mapping(struct page *pgt, int level) __ClearPageSplitpgt(pgt); __free_page(pgt); + + merge_page_count(level); } static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 404024486fa5..00a9a435af49 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -143,6 +143,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, + DIRECT_MAP_LEVEL1_MERGE, + DIRECT_MAP_LEVEL2_MERGE, #endif NR_VM_EVENT_ITEMS }; diff --git a/mm/vmstat.c b/mm/vmstat.c index 373d2730fcf2..1a4287a4d614 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1403,6 +1403,8 @@ const char * const vmstat_text[] = { #ifdef CONFIG_X86 "direct_map_level2_splits", "direct_map_level3_splits", + "direct_map_level1_merges", + "direct_map_level2_merges", #endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; From patchwork Mon Aug 8 14:56:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 12938862 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67E54C25B0D for ; Mon, 8 Aug 2022 14:57:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A2EB8E0006; Mon, 8 Aug 2022 10:57:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52C5F8E0002; Mon, 8 Aug 2022 10:57:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37CAC8E0006; Mon, 8 Aug 2022 10:57:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1F8B18E0002 for ; Mon, 8 Aug 2022 10:57:42 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E4556140F1B for ; Mon, 8 Aug 2022 14:57:41 +0000 (UTC) X-FDA: 79776729522.18.3BC0930 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf30.hostedemail.com (Postfix) with ESMTP id 476C380042 for ; Mon, 8 Aug 2022 14:57:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659970661; x=1691506661; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=smWZqXtuGrkZm9PWI4t1Ng9axLN7SJF3zgimxq/DvoY=; b=D+NB/fk+wMvfyf1nl5jxlTVetCW+TyIqZQn1q4DImY9j71qIDHIIt4PE Z+8ZzCKJrInhmH00Jwc3hcUYDTEHf9pnmcjqYccGdE7Ded5YAbGVEn2WB cbxCcSrkYpqLYEt7+AZAhHNFFs48UHjPvZDmzBO904sUvok2hN7aobE7+ jwzMblaMxA9cdhmgTdOvItnPleRWu/B+RUaWs37EPM4qWXCdTTqzenEsy P9YcjFjhipFqSJG0yK+k5yWWPqyKE+5ciPRyFB2tYne3LP/mH4YKdIJwf Z+phNxQh+6pgCSjyjSGLFq74MqM6tW1L1OqHdJLMXJXOxR6yaTmpW4x86 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10433"; a="290618483" X-IronPort-AV: E=Sophos;i="5.93,222,1654585200"; d="scan'208";a="290618483" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 07:57:39 -0700 X-IronPort-AV: E=Sophos;i="5.93,222,1654585200"; d="scan'208";a="663980524" Received: from ziqianlu-desk2.sh.intel.com ([10.238.2.76]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2022 07:57:38 -0700 From: Aaron Lu To: Dave Hansen , Rick Edgecombe Cc: Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [TEST NOT_FOR_MERGE 4/4] x86/mm/cpa: add a test interface to split direct map Date: Mon, 8 Aug 2022 22:56:49 +0800 Message-Id: <20220808145649.2261258-5-aaron.lu@intel.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220808145649.2261258-1-aaron.lu@intel.com> References: <20220808145649.2261258-1-aaron.lu@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659970661; a=rsa-sha256; cv=none; b=um043iJPr4xwgoF5kRgKScNz6B8Hzd68muY/RFYkvNsXuixw+cUAz5VaRuHjevgj4hTby/ J+zQMH3mPgFc9TlmUAgIK52NkECZnl3bkxSLhldq1lauTxbZgawxj8XqHBsGO/89GFGY1t 3yotWFuTb2rQiuapP83T6z5/h+zaCzk= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="D+NB/fk+"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf30.hostedemail.com: domain of aaron.lu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=aaron.lu@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659970661; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5ksOb88ZajfZ5D8u73Cd71utZwdki0xN1K5B+WklFTY=; b=qvJwmTy0VLbGz9g95iZn9bbC7KSaBjsKH3pmwIILslGsmoH7tbqt+AbnhQWMyy4Qv0SR6R Rz5NvFSaq81bnBfqxiJi6FGJoxBZWfECblUXUwmP3XT0cMk2ZjkcR9xYzWsP3cIbmawfJ1 Xpy8wMBv5azGBFZ2sGqMDte6gTJOgDw= X-Rspamd-Server: rspam10 X-Stat-Signature: q7js5di79pw39pfzhh95atrio55qh9gs Authentication-Results: imf30.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="D+NB/fk+"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf30.hostedemail.com: domain of aaron.lu@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=aaron.lu@intel.com X-Rspam-User: X-Rspamd-Queue-Id: 476C380042 X-HE-Tag: 1659970661-738139 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To test this functionality, a debugfs interface is added: /sys/kernel/debug/x86/split_mapping There are three test modes. mode 0: allocate $page_nr pages and set each page's protection first to RO and X and then back to RW and NX. This is used to test multiple CPUs dealing with different address ranges. mode 1: allocate several pages and create $nr_cpu kthreads to simultaneously change those pages protection with a fixed pattern. This is used to test multiple CPUs dealing with the same address range. mode 2: same as mode 0 except using alloc_pages() instead of vmalloc() because vmalloc space is too small on x86_32/pae. On a x86_64 VM, I started mode0.sh and mode1.sh at the same time: mode0.sh: mode=0 page_nr=200000 nr_cpu=16 function test_one() { echo $mode $page_nr > /sys/kernel/debug/x86/split_mapping } while true; do for i in `seq $nr_cpu`; do test_one & done wait done mode1.sh: mode=1 page_nr=1 echo $mode $page_nr > /sys/kernel/debug/x86/split_mapping After 5 hours, no problem occured with some millions of splits and merges. For x86_32 and x86_pae, mode2 test is used and also no problem found. Signed-off-by: Aaron Lu --- arch/x86/mm/pat/set_memory.c | 206 +++++++++++++++++++++++++++++++++++ 1 file changed, 206 insertions(+) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 1be9aab42c79..4deea4de73e7 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -20,6 +20,9 @@ #include #include #include +#include +#include +#include #include #include @@ -2556,6 +2559,209 @@ int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address, return retval; } +static int split_mapping_mode0_test(int page_nr) +{ + void **addr_buff; + void *addr; + int i, j; + + addr_buff = kvmalloc(sizeof(void *) * page_nr, GFP_KERNEL); + if (!addr_buff) { + pr_err("addr_buff: no memory\n"); + return -ENOMEM; + } + + for (i = 0; i < page_nr; i++) { + addr = vmalloc(PAGE_SIZE); + if (!addr) { + pr_err("no memory\n"); + break; + } + + set_memory_ro((unsigned long)addr, 1); + set_memory_x((unsigned long)addr, 1); + + addr_buff[i] = addr; + } + + for (j = 0; j < i; j++) { + set_memory_nx((unsigned long)addr_buff[j], 1); + set_memory_rw((unsigned long)addr_buff[j], 1); + vfree(addr_buff[j]); + } + + kvfree(addr_buff); + + return 0; +} + +struct split_mapping_mode1_data { + unsigned long addr; + int page_nr; +}; + +static int split_mapping_set_prot(void *data) +{ + struct split_mapping_mode1_data *d = data; + unsigned long addr = d->addr; + int page_nr = d->page_nr; + int m; + + m = get_random_int() % 100; + msleep(m); + + while (!kthread_should_stop()) { + set_memory_ro(addr, page_nr); + set_memory_x(addr, page_nr); + set_memory_rw(addr, page_nr); + set_memory_nx(addr, page_nr); + cond_resched(); + } + + return 0; +} + +static int split_mapping_mode1_test(int page_nr) +{ + int nr_kthreads = num_online_cpus(); + struct split_mapping_mode1_data d; + struct task_struct **kthreads; + int i, j, ret; + void *addr; + + addr = vmalloc(PAGE_SIZE * page_nr); + if (!addr) + return -ENOMEM; + + kthreads = kmalloc(nr_kthreads * sizeof(struct task_struct *), GFP_KERNEL); + if (!kthreads) { + vfree(addr); + return -ENOMEM; + } + + d.addr = (unsigned long)addr; + d.page_nr = page_nr; + for (i = 0; i < nr_kthreads; i++) { + kthreads[i] = kthread_run(split_mapping_set_prot, &d, "split_mappingd%d", i); + if (IS_ERR(kthreads[i])) { + for (j = 0; j < i; j++) + kthread_stop(kthreads[j]); + ret = PTR_ERR(kthreads[i]); + goto out; + } + } + + while (1) { + if (signal_pending(current)) { + for (i = 0; i < nr_kthreads; i++) + kthread_stop(kthreads[i]); + ret = 0; + break; + } + msleep(1000); + } + +out: + kfree(kthreads); + vfree(addr); + return ret; +} + +static int split_mapping_mode2_test(int page_nr) +{ + struct page *p, *t; + unsigned long addr; + int i; + + LIST_HEAD(head); + + for (i = 0; i < page_nr; i++) { + p = alloc_pages(GFP_KERNEL | GFP_DMA32, 0); + if (!p) { + pr_err("no memory\n"); + break; + } + + addr = (unsigned long)page_address(p); + BUG_ON(!addr); + + set_memory_ro(addr, 1); + set_memory_x(addr, 1); + + list_add(&p->lru, &head); + } + + list_for_each_entry_safe(p, t, &head, lru) { + addr = (unsigned long)page_address(p); + set_memory_nx(addr, 1); + set_memory_rw(addr, 1); + + list_del(&p->lru); + __free_page(p); + } + + return 0; +} +static ssize_t split_mapping_write_file(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + unsigned int mode = 0, page_nr = 0; + char buffer[64]; + int ret; + + if (count > 64) + return -EINVAL; + + if (copy_from_user(buffer, buf, count)) + return -EFAULT; + sscanf(buffer, "%u %u", &mode, &page_nr); + + /* + * There are 3 test modes. + * mode 0: each thread allocates $page_nr pages and set each page's + * protection first to RO and X and then back to RW and NX. + * This is used to test multiple CPUs dealing with different + * pages. + * mode 1: allocate several pages and create $nr_cpu kthreads to + * simultaneously change those pages protection to a fixed + * pattern. This is used to test multiple CPUs dealing with + * some same page's protection. + * mode 2: like mode 0 but directly use alloc_pages() because vmalloc + * area on x86_32 is too small, only 128M. + */ + if (mode > 2) + return -EINVAL; + + if (page_nr == 0) + return -EINVAL; + + if (mode == 0) + ret = split_mapping_mode0_test(page_nr); + else if (mode == 1) + ret = split_mapping_mode1_test(page_nr); + else + ret = split_mapping_mode2_test(page_nr); + + return ret ? ret : count; +} + +static const struct file_operations split_mapping_fops = { + .write = split_mapping_write_file, +}; + +static int __init split_mapping_init(void) +{ + struct dentry *d = debugfs_create_file("split_mapping", S_IWUSR, arch_debugfs_dir, NULL, + &split_mapping_fops); + if (IS_ERR(d)) { + pr_err("create split_mapping failed: %ld\n", PTR_ERR(d)); + return PTR_ERR(d); + } + + return 0; +} +late_initcall(split_mapping_init); + /* * The testcases use internal knowledge of the implementation that shouldn't * be exposed to the rest of the kernel. Include these directly here.