From patchwork Mon Jul 1 08:46:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13717671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B64EFC2BD09 for ; Mon, 1 Jul 2024 08:48:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 379076B008A; Mon, 1 Jul 2024 04:48:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 32AE86B00AB; Mon, 1 Jul 2024 04:48:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A4F76B009E; Mon, 1 Jul 2024 04:48:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ED6D96B00AB for ; Mon, 1 Jul 2024 04:48:28 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9BA0B1C32F1 for ; Mon, 1 Jul 2024 08:48:28 +0000 (UTC) X-FDA: 82290557496.02.B8DCFBF Received: from mail-oa1-f54.google.com (mail-oa1-f54.google.com [209.85.160.54]) by imf29.hostedemail.com (Postfix) with ESMTP id DDB0312000C for ; Mon, 1 Jul 2024 08:48:26 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=QF5lE4AF; spf=none (imf29.hostedemail.com: domain of zhengqi.arch@bytedance.com has no SPF policy when checking 209.85.160.54) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719823696; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=McQK/YxjiDTA99Z+xjg1GpN+1HDVMfpYwsApecFoTmw=; b=ASFUadJ+wljszNmdH2DQxMdEAQX3T/rxLvYpLCsuL5p4lD9M+aqxA43f6ZA4BEX9tBSKgL bPJco6q5ZSTSPlccMjkTu11a7dWxxQhcqxDH8q/e/dISJESXfIxS6ROn5OGWrcd8NUugju zCr6ilKj5BlfyvnWNoo/z5OYHtX/gZk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=QF5lE4AF; spf=none (imf29.hostedemail.com: domain of zhengqi.arch@bytedance.com has no SPF policy when checking 209.85.160.54) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719823696; a=rsa-sha256; cv=none; b=pfcL0DPmgCs2o6TdW4DqlcTprzi15XJpy0PyLLZv7dHoWna9Lm3ks0KSLVs63YSXqUStAe w40UJ0pM1SyuBOeQn/z1jgHLyP/sX49SWpAoeknoSzF1AtfE6gMB4x5KHcEDpKh8inqmE+ 40PV3Ba+kPDmvycshn7U52wt+iLFNZg= Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-25d0f8d79ebso424509fac.0 for ; Mon, 01 Jul 2024 01:48:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1719823706; x=1720428506; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=McQK/YxjiDTA99Z+xjg1GpN+1HDVMfpYwsApecFoTmw=; b=QF5lE4AF4pdGmye0pslAlIaAQVl6zi8sVTMJIDENhXA+dgMq0AFUUvwJLbz7IhdsMt +SwHlRmbF+w8G/V/j0xavPjPTOXk9L5/i48kf8pKoDPUES4MzVHzYxfdTssGkUmFw+l+ SVIQizwlCqq1YQMQUUshT6FxyR8vwUSqXdfvL4aiXwwdEYHNyK+98JI+xFKcCzBh5t5v VJp8/wPK38G/KOz8I3QygLuwmIdIBYLmAeeKgEQbK/Nt5WUyDPg0Ooftos9QXqGkLBnf vpO+XtuLMk9mlkl7seQjb/HhoVxhjbR14Xnokc4MBuevaJNT2J8zFC2dnRXE9Eh17Lij hhgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719823706; x=1720428506; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=McQK/YxjiDTA99Z+xjg1GpN+1HDVMfpYwsApecFoTmw=; b=rV2FI1m6EeKScl0wtwKdOquC+sk6OQWBpmkgmQiQ5e7Xqyem+fLbzmyI9TGtLGScdE L89kD54W4AOY/NqTgazbHhaegfGQzt6oJARAycMUBeBEjlGmC+5LgmDMhLWjQgNtS6Cg faGovw7EzNAnXyhawGRdEkEGqmai7PWlbFqyXxF7AmdWLOXFfc+DotHraKd+0fxNL690 GiB4lAgGJPvyV3yszvZhxSL6mQw/in1WG9GAHg0kv22XcHGAIk3pJjjdq1pacDFzQuiL YrE3Sl0bD/jKcaIpGj0tpC2gqLauqvyx7sa9ODy/Sw5+g1pXjfbux4VdbvWZpV1vRK/j oRJA== X-Gm-Message-State: AOJu0YxOFX6SwaGFIQ3Nd/u0F9JP8q/rTfTVA2opWcml67FJJXsLl058 fnKQgS3E5I06Z4ALNhTCAu2d6qBFHt/VW8IylKwd+a7JdR06Y3ry/yf/UPpLDXY= X-Google-Smtp-Source: AGHT+IFN2WppGgIQE1/8To548LzmJeYPiitqZmZGxr2Jxr97Tojwrcl4bY6VD5q2UaSwNJF/59bDBA== X-Received: by 2002:a05:6870:2892:b0:255:1fea:340d with SMTP id 586e51a60fabf-25db3049d93mr5100637fac.0.1719823705877; Mon, 01 Jul 2024 01:48:25 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70804a7e7e0sm5932374b3a.204.2024.07.01.01.48.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Jul 2024 01:48:25 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, mgorman@suse.de, muchun.song@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [RFC PATCH 5/7] x86: mm: free page table pages by RCU instead of semi RCU Date: Mon, 1 Jul 2024 16:46:46 +0800 Message-Id: <1a27215790293face83242cfd703e910aa0c5ce8.1719570849.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: DDB0312000C X-Stat-Signature: j6kkqnhybgfnmnnxuqh8p16ukeegtf6u X-HE-Tag: 1719823706-889913 X-HE-Meta: U2FsdGVkX18shwj25Bd9mWv8zmSB/N2/Her2zgxOaatZxo7+fgoU6bMAxmeIHd+ThSXHY6H8z08+ywhOKm/ssWWC76u/DUNrbZW3gFcUOjn+Wj5msHHUAoPMZC3LygRJU7/1syMaPGgrZPPlsCy0oSw3FuOqUjqppuxqke97zOpBk8ryjYRo/vNxMSELyoYmtY0evpJ3aE1b0OAGrY+XyOT5T4iTI3zxlozDXXEsNyvcrGb47eF/jOM9wLh3onDeJ16m7E3CtFvz4J9hPRX/ePkyFyQi+63JXj22606z1HSUYXOidiEPOa23RYGXTvsj47RnR4MtxLtfqFsqQFXXOjsKPkuK3JI6MIeuOoas9qI1HWrIVUvwAlqCKVQEiHOK+DZdVz9Z8mNIlz56A7YmHJ4PL/ebe8bCIWGmdawffIC8GsKMUHFvfGzSvEH3iJsmwHm7txy7C6NQviDYhHkbwbdcnZI9hZ0YDNX6hkKQFxlTZze+3RLzzV07TA0BbrvEpPLd78UJ6qk4rt+SVfz1zrSn3NUhEM9pKlFI0yy+oxx5UBzKzAwCmgIvGAbSk65sC48NdFMsb1bLEsl1Rx4EE6VDiGytCPS1tx1kQDOh65bdCf2RnyY7JA/RgslNjHdpwmTcSzw5itmuVKex+nxq8nEp3payxd47Z66X21tNS0eljoV8bo6j7QsIopVoEDKHC4FWY5NunCST0aTSQtnnQhz91OVPA1v388iubR6NKcfQwr0OI32v5LmwYPan2AMgMSzYPIKl9uAHpJzVOn9EBaqCe+MHjOAkPDAZF0mXgJ/y1fGm9X4hH0ylENzbLhxG0hD1ExK8I/c6uleQ23xsGu2oBFzBxuap53Bl6J54a7wV7KzEixRR6zL+1vHynK76MTUOAx57cScwgtZofvl2CqDzfEs8HfNRhGeC+5wKyIQe/vsRfttjhHZX3w6nCYyN5hsajWTNONuZ/a0LMg7 SoIiFSS7 BbNWz1uBvhkopco1SvEIWlZ0K/qN5+IMp4Ul/TarZAw4hfRqONQNlJwcWOQFZECKGk57ye6ebIksbTRxUVcitJzK5k4GQGIi0rKyNJQ2p5OFnmPk2Bo7Ri4EajFnqxdZ4HsYIYvDZDATYC6DAMpfTs0h5Bzg0OP/Gk24vpRlmDtKomktUIagtQGSMJWETKwABivP93YZ6AK/YSAQFMv2opZFepcKOQT5h+6Z03ZwI87oqGFbi+xA76XM4f24Oh02tSCH9ctifW8BGgWOpJ0HrRBBZKp+Qp+HZIBvX4xrfkhcyjvRT98PYxaa7sg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages will be freed by semi RCU, that is: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). In preparation for supporting empty PTE page table pages reclaimation, let single table also be freed by RCU like batch table freeing. Then we can also use pte_offset_map() etc to prevent PTE page from being freed. Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free the page table pages: - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. - For pt_list, it is used to manage the PGD page in x86. Fortunately tlb_remove_table() will not be used for free PGD pages, so it is safe to use pt_rcu_head. - For pmd_huge_pte, we will do zap_deposited_table() before freeing the PMD page, so it is also safe. Signed-off-by: Qi Zheng --- arch/x86/include/asm/tlb.h | 23 +++++++++++++++++++++++ arch/x86/kernel/paravirt.c | 7 +++++++ arch/x86/mm/pgtable.c | 2 +- mm/mmu_gather.c | 2 +- 4 files changed, 32 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index 580636cdc257..9182db1e0264 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -34,4 +34,27 @@ static inline void __tlb_remove_table(void *table) free_page_and_swap_cache(table); } +#ifndef CONFIG_PT_RECLAIM +static inline void __tlb_remove_table_one(void *table) +{ + free_page_and_swap_cache(table); +} +#else +static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + free_page_and_swap_cache(page); +} + +static inline void __tlb_remove_table_one(void *table) +{ + struct page *page; + + page = table; + call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu); +} +#endif /* CONFIG_PT_RECLAIM */ + #endif /* _ASM_X86_TLB_H */ diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 5358d43886ad..199b9a3813b4 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -60,10 +60,17 @@ void __init native_pv_lock_init(void) static_branch_disable(&virt_spin_lock_key); } +#ifndef CONFIG_PT_RECLAIM static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) { tlb_remove_page(tlb, table); } +#else +static void native_tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + tlb_remove_table(tlb, table); +} +#endif struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 93e54ba91fbf..cd5bf2157611 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -18,7 +18,7 @@ EXPORT_SYMBOL(physical_mask); #define PGTABLE_HIGHMEM 0 #endif -#ifndef CONFIG_PARAVIRT +#if !defined(CONFIG_PARAVIRT) && !defined(CONFIG_PT_RECLAIM) static inline void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table) { diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 99b3e9408aa0..1a8f7b8781a2 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -314,7 +314,7 @@ static inline void tlb_table_invalidate(struct mmu_gather *tlb) static void tlb_remove_table_one(void *table) { tlb_remove_table_sync_one(); - __tlb_remove_table(table); + __tlb_remove_table_one(table); } static void tlb_table_flush(struct mmu_gather *tlb)