From patchwork Wed Nov 10 08:40:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 322ABC433F5 for ; Wed, 10 Nov 2021 08:41:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AFD9A61205 for ; Wed, 10 Nov 2021 08:41:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AFD9A61205 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 50D276B0071; Wed, 10 Nov 2021 03:41:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4BD1D6B0072; Wed, 10 Nov 2021 03:41:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3852D6B0073; Wed, 10 Nov 2021 03:41:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id 2A3E66B0071 for ; Wed, 10 Nov 2021 03:41:32 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DF60C184E9AE7 for ; Wed, 10 Nov 2021 08:41:31 +0000 (UTC) X-FDA: 78792376782.08.BF0B31F Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf03.hostedemail.com (Postfix) with ESMTP id 62C9930000AB for ; Wed, 10 Nov 2021 08:41:23 +0000 (UTC) Received: by mail-pf1-f180.google.com with SMTP id o4so1999536pfp.13 for ; Wed, 10 Nov 2021 00:41:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Qt7zreSDUacj2v2u5kH5eyo4ziYtzmNFB0lGNIs9uO4=; b=NOvrflc+LPqLg5YyJ0rxgNwEIVBSw1pADjLyV8ro/XVfNu+93Dhk9aWBGu5z1yKl11 pU2WHZsQcPuZ1TsBFF2jxdUPJRB0SxYRW25B8Z1RlmgCDqWbr+lumkv0THymyDNiBBdg aZX5cAhJt9ZV+lSjWzg0KqEjnm474BBKBIn+Cw5T9fWXZ/eYZYJdlcAfHq+pilrWPmib of6lU/cFOKihdZk83jXzwTz1kQFfIfllVaqWe/L05LL31mHl9BeSQvQCvHr58FyClY0g /6occjuka3lZoG/Zi5ObRJ0yf9G/7ruzf5qCNnehsWBS3udLIjYr8PT5bksrYIoqgAD+ fVFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Qt7zreSDUacj2v2u5kH5eyo4ziYtzmNFB0lGNIs9uO4=; b=TSEoUohAkU6aFoYsUZTIn3r5MERbDPdAM179SNcVE+jOki4+S2YdHR7Zi0EtFQgXAG za+viGqUz2dso4yBkuT4kojyFOvSMNlVR3WXZgFZxO85dhio6fxj5U1FYewq/y8GOBpN f9zos9bgDWWllWLx2jl9AMqbXRBRp9w/FT84lWOcl1z3N/Jfu4daZcQ9lgPzLztJInPk 2XSQPZNKOE8b8Mk0xUkQ66KaiVwJQp1jta9qxMRMUsC0tXe5W2v+UZyNoi/KyWsfVTlZ zQvAy3Jr+jiXxunauCo6Awr9Pw2xXDluRH349IuGC/vILJ4myDzKIH+oLWsqhW55EDYc ZuRA== X-Gm-Message-State: AOAM5323HJn2Esw3EYIvCoiTcj2/74aRDl5IDpEqYwyoVlVhwsuxlC/H d1/V4RittXQVBNjWthqnc+RtgjHK7U1wUQ== X-Google-Smtp-Source: ABdhPJwOUzG3sjGF9SIZl3T5LDP3eOmiS/U6BsPv+oNn3riL5hCu+fFDClnn2hk/JAQtx/Wcp9vD5w== X-Received: by 2002:a63:b603:: with SMTP id j3mr10572825pgf.427.1636533690702; Wed, 10 Nov 2021 00:41:30 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.41.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:41:30 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 01/15] mm: do code cleanups to filemap_map_pmd() Date: Wed, 10 Nov 2021 16:40:43 +0800 Message-Id: <20211110084057.27676-2-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 62C9930000AB X-Stat-Signature: e5iycn8sw5xtqnb81jf1fhs3wimtgsbb Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=NOvrflc+; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf03.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com X-HE-Tag: 1636533683-720011 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently we have two times the same few lines repeated in filemap_map_pmd(). Deduplicate them and fix some code style issues. Signed-off-by: Qi Zheng --- mm/filemap.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index daa0e23a6ee6..07c654202870 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3203,11 +3203,8 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page) struct mm_struct *mm = vmf->vma->vm_mm; /* Huge page is mapped? No need to proceed. */ - if (pmd_trans_huge(*vmf->pmd)) { - unlock_page(page); - put_page(page); - return true; - } + if (pmd_trans_huge(*vmf->pmd)) + goto out; if (pmd_none(*vmf->pmd) && PageTransHuge(page)) { vm_fault_t ret = do_set_pmd(vmf, page); @@ -3222,13 +3219,15 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page) pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); /* See comment in handle_pte_fault() */ - if (pmd_devmap_trans_unstable(vmf->pmd)) { - unlock_page(page); - put_page(page); - return true; - } + if (pmd_devmap_trans_unstable(vmf->pmd)) + goto out; return false; + +out: + unlock_page(page); + put_page(page); + return true; } static struct page *next_uptodate_page(struct page *page, From patchwork Wed Nov 10 08:40:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 822A7C433F5 for ; Wed, 10 Nov 2021 08:41:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F3B8E61248 for ; Wed, 10 Nov 2021 08:41:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org F3B8E61248 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A157A6B0072; Wed, 10 Nov 2021 03:41:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C5B86B0073; Wed, 10 Nov 2021 03:41:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88DF66B0074; Wed, 10 Nov 2021 03:41:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7A72F6B0072 for ; Wed, 10 Nov 2021 03:41:41 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 3796C184E9AD8 for ; Wed, 10 Nov 2021 08:41:41 +0000 (UTC) X-FDA: 78792377202.11.A600C7A Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf20.hostedemail.com (Postfix) with ESMTP id E2A46D0000A2 for ; Wed, 10 Nov 2021 08:41:29 +0000 (UTC) Received: by mail-pl1-f175.google.com with SMTP id n8so2427556plf.4 for ; Wed, 10 Nov 2021 00:41:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2uW/yCeCX04tsKzmi9YATgmqhnOC3lon0JZ/8HxInXE=; b=PC+0ptR3CRoiOYcuGEmuCfx1B/fVgoY+9n0GDJwvB0oHxq/KL0GzNuaNhQWJfTxdbu 8iQm+guscstXIodBn6FwCjrPmfhOfjR+syyiUsqg0Tn9NxACOneyl4y2Ul6SnMgLFnXk 5PWxJOUo97aTooEHI1WGfcZOiR7Bju7rK6pVN9/peaGEJtze+F9+3tHZRiYygTWMcw1w krsleb0BfFAYoSOaXKwuXtRqNzkeyZ3gM8EsE3+sS6PB32wayzedhbRsCtS4JcEvHwuP WvJOiI/Hq68rpmjTlfItIauKn++7RGwbuKWwoYXOP9rO59iABWDh1PpN4VkN/AgXK+Jp 5cSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2uW/yCeCX04tsKzmi9YATgmqhnOC3lon0JZ/8HxInXE=; b=iMcp8CBK3OluwKx7KPKcArKo00rd02gqCjRwIwfG43wHQpee0ul/azz1whISB17otx MPz9uRNQ6He+2pUgNIT/N2fjObkCsysKiMrfXI4RWncEPPeJCCDXn1A/XEVil9QcSl7W AgVkKKcFcRabd6bGO3ElL7kbJwisFS8XqVxkFse2UHJokRkPBMeEzKgA6PwELurxkIj5 d0rp7SHY04+a4Pj09nJDV8quvCLSe5YAXrnRTP0HkkLM7a9pi6E9dox7q6ZcsnptNbhD 0JbrgNphl8xyVuTPdvHGGCIG7935heuJyKZs+HlJj4hXmpaR5gjLhNZKGWiUYZr9GFlt r4FA== X-Gm-Message-State: AOAM533kkQnwEEQle/FV0L27CwaLDyGPPiLZzpOrmFnY2CeriytqLuNA GpVJ24Y7fXqpn+rRfMvf43zN3A== X-Google-Smtp-Source: ABdhPJxDUHb3N0EB7fBRSwo8sSEnmyV/FEoadE7ZqM0uRUd7DpuLRPGmQQH7K5eaSOY8PlhYWxmgaw== X-Received: by 2002:a17:90a:e389:: with SMTP id b9mr14737638pjz.235.1636533699652; Wed, 10 Nov 2021 00:41:39 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.41.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:41:39 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 02/15] mm: introduce is_huge_pmd() helper Date: Wed, 10 Nov 2021 16:40:44 +0800 Message-Id: <20211110084057.27676-3-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E2A46D0000A2 X-Stat-Signature: rar3mrmr467xxephuhyk91jnyc4muixr Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=PC+0ptR3; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf20.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com X-HE-Tag: 1636533689-10199 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently we have some times the following judgments repeated in the code: is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd) which is to determine whether the *pmd is a huge pmd, so introduce is_huge_pmd() helper to deduplicate them. Signed-off-by: Qi Zheng --- include/linux/huge_mm.h | 10 +++++++--- mm/huge_memory.c | 3 +-- mm/memory.c | 5 ++--- mm/mprotect.c | 2 +- mm/mremap.c | 3 +-- 5 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f280f33ff223..b37a89180846 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -199,8 +199,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, #define split_huge_pmd(__vma, __pmd, __address) \ do { \ pmd_t *____pmd = (__pmd); \ - if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd) \ - || pmd_devmap(*____pmd)) \ + if (is_huge_pmd(*____pmd)) \ __split_huge_pmd(__vma, __pmd, __address, \ false, NULL); \ } while (0) @@ -232,11 +231,16 @@ static inline int is_swap_pmd(pmd_t pmd) return !pmd_none(pmd) && !pmd_present(pmd); } +static inline int is_huge_pmd(pmd_t pmd) +{ + return is_swap_pmd(pmd) || pmd_trans_huge(pmd) || pmd_devmap(pmd); +} + /* mmap_lock must be held on entry */ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + if (is_huge_pmd(*pmd)) return __pmd_trans_huge_lock(pmd, vma); else return NULL; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e5483347291c..e76ee2e1e423 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1832,8 +1832,7 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { spinlock_t *ptl; ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) + if (likely(is_huge_pmd(*pmd))) return ptl; spin_unlock(ptl); return NULL; diff --git a/mm/memory.c b/mm/memory.c index 855486fff526..b00cd60fc368 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1146,8 +1146,7 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pmd = pmd_offset(src_pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd) - || pmd_devmap(*src_pmd)) { + if (is_huge_pmd(*src_pmd)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma); err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd, @@ -1441,7 +1440,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { + if (is_huge_pmd(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); else if (zap_huge_pmd(tlb, vma, pmd, addr)) diff --git a/mm/mprotect.c b/mm/mprotect.c index e552f5e0ccbd..2d5064a4631c 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -257,7 +257,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); } - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { + if (is_huge_pmd(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) { __split_huge_pmd(vma, pmd, addr, false, NULL); } else { diff --git a/mm/mremap.c b/mm/mremap.c index 002eec83e91e..c6e9da09dd0a 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -532,8 +532,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, new_pmd = alloc_new_pmd(vma->vm_mm, vma, new_addr); if (!new_pmd) break; - if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || - pmd_devmap(*old_pmd)) { + if (is_huge_pmd(*old_pmd)) { if (extent == HPAGE_PMD_SIZE && move_pgt_entry(HPAGE_PMD, vma, old_addr, new_addr, old_pmd, new_pmd, need_rmap_locks)) From patchwork Wed Nov 10 08:40:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF0E0C433FE for ; Wed, 10 Nov 2021 08:41:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4300C6124D for ; Wed, 10 Nov 2021 08:41:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4300C6124D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D23426B0073; Wed, 10 Nov 2021 03:41:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CFB4C6B0074; Wed, 10 Nov 2021 03:41:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9AFE6B0075; Wed, 10 Nov 2021 03:41:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id ABDDD6B0073 for ; Wed, 10 Nov 2021 03:41:47 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 71AF4824C45A for ; Wed, 10 Nov 2021 08:41:47 +0000 (UTC) X-FDA: 78792377454.14.1AC456D Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf29.hostedemail.com (Postfix) with ESMTP id 158AF9000263 for ; Wed, 10 Nov 2021 08:41:46 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id 188so1651105pgb.7 for ; Wed, 10 Nov 2021 00:41:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+MsbkM47+d/dKbMtk5lgpAD1nJiQP3KKT2CTa5NSKic=; b=GuFa/5BcXtMLjpHuNLAp2Mnf/7GkFL52x24pA3lEgtCnzLHWO5lTrLy51FIHsdt83B 4imAZ63u/gZvnnmGR/psKmQLnCcmy0pR+MSpEqCiwKmZoAAgqdagxW17FZJpE6hhc7gO vVPel1rnEia8H5kQnGCtX/aNkXSC5QBlkHBoVhopYid3AafHIc5Y1R/9hWqNLAfg2VBK iJep1gtL33gnVF3YHiQ+iKrN6q182/+lVLZtdGNZ3gJYAztjDVfH7sAym3JKK9CE/9WA DTkAC42KcZ4N8vetclu+iZF+7aH3SqB4sOaPmrg8gAvaojycy/TExfSmkqNlHZp5HKS2 /9Tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+MsbkM47+d/dKbMtk5lgpAD1nJiQP3KKT2CTa5NSKic=; b=yUt0oONQqwb8dYakwCp+29okQPKBsHNnuXzzEh6IqaEOmzkdarOl8Kkjq7TvV69z39 exITqtzUiTMX7wcrZ1mDoK0c7HDWm2LhSZOZf+/pGeaQnng7bsZPjdXp6flVy58yEKdR rJ0K4L8mcv2x3qFCCalHWuY/vjNWc0y7p2pQJ4DCu29yI4SrxUYq/CG54NgFo76ssear SNy+wvTbURu1jkWPbFf5y6KjrS6SGgTlWRLzTWZEhENceJufSNWubhADS4Z+8M9UOP9m Dj5rJ1mntI1pCPnALNQdTcqWV7m70UmGiQAMRp4DZwB5orhcrcDJyzJ/t5gSYBIMMPW7 EAGg== X-Gm-Message-State: AOAM530ZFiB4EIdXOHGTZ/Nx7MBTxjj2VV0jx4keg2O3BlC+podDETF5 2WJL1aCaWD0olfD92a3s6LyHQA== X-Google-Smtp-Source: ABdhPJyPCEn8p6Ohih2He2iy9+cyYyAw6gzc+KNszWRgtMh1Y+BFZ0nU8LGE69AIg/v7UjCdnutsNA== X-Received: by 2002:a63:30a:: with SMTP id 10mr10531729pgd.229.1636533706058; Wed, 10 Nov 2021 00:41:46 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.41.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:41:45 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 03/15] mm: move pte_offset_map_lock() to pgtable.h Date: Wed, 10 Nov 2021 16:40:45 +0800 Message-Id: <20211110084057.27676-4-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 158AF9000263 X-Stat-Signature: 4rpym13ci8necebyunwjh6as4kkrtp1w Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="GuFa/5Bc"; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf29.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com X-HE-Tag: 1636533706-769037 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: pte_offset_map() is in include/linux/pgtable.h, so move its friend pte_offset_map_lock() to pgtable.h together. pte_lockptr() is required for pte_offset_map_lock(), so also move {pte,pmd,pud}_lockptr() to pgtable.h. Signed-off-by: Qi Zheng --- include/linux/mm.h | 149 ------------------------------------------------ include/linux/pgtable.h | 149 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 149 insertions(+), 149 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a7e4a9e7d807..706da081b9f8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2284,70 +2284,6 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a } #endif /* CONFIG_MMU */ -#if USE_SPLIT_PTE_PTLOCKS -#if ALLOC_SPLIT_PTLOCKS -void __init ptlock_cache_init(void); -extern bool ptlock_alloc(struct page *page); -extern void ptlock_free(struct page *page); - -static inline spinlock_t *ptlock_ptr(struct page *page) -{ - return page->ptl; -} -#else /* ALLOC_SPLIT_PTLOCKS */ -static inline void ptlock_cache_init(void) -{ -} - -static inline bool ptlock_alloc(struct page *page) -{ - return true; -} - -static inline void ptlock_free(struct page *page) -{ -} - -static inline spinlock_t *ptlock_ptr(struct page *page) -{ - return &page->ptl; -} -#endif /* ALLOC_SPLIT_PTLOCKS */ - -static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) -{ - return ptlock_ptr(pmd_page(*pmd)); -} - -static inline bool ptlock_init(struct page *page) -{ - /* - * prep_new_page() initialize page->private (and therefore page->ptl) - * with 0. Make sure nobody took it in use in between. - * - * It can happen if arch try to use slab for page table allocation: - * slab code uses page->slab_cache, which share storage with page->ptl. - */ - VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page); - if (!ptlock_alloc(page)) - return false; - spin_lock_init(ptlock_ptr(page)); - return true; -} - -#else /* !USE_SPLIT_PTE_PTLOCKS */ -/* - * We use mm->page_table_lock to guard all pagetable pages of the mm. - */ -static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) -{ - return &mm->page_table_lock; -} -static inline void ptlock_cache_init(void) {} -static inline bool ptlock_init(struct page *page) { return true; } -static inline void ptlock_free(struct page *page) {} -#endif /* USE_SPLIT_PTE_PTLOCKS */ - static inline void pgtable_init(void) { ptlock_cache_init(); @@ -2370,20 +2306,6 @@ static inline void pgtable_pte_page_dtor(struct page *page) dec_lruvec_page_state(page, NR_PAGETABLE); } -#define pte_offset_map_lock(mm, pmd, address, ptlp) \ -({ \ - spinlock_t *__ptl = pte_lockptr(mm, pmd); \ - pte_t *__pte = pte_offset_map(pmd, address); \ - *(ptlp) = __ptl; \ - spin_lock(__ptl); \ - __pte; \ -}) - -#define pte_unmap_unlock(pte, ptl) do { \ - spin_unlock(ptl); \ - pte_unmap(pte); \ -} while (0) - #define pte_alloc(mm, pmd) (unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, pmd)) #define pte_alloc_map(mm, pmd, address) \ @@ -2397,58 +2319,6 @@ static inline void pgtable_pte_page_dtor(struct page *page) ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd))? \ NULL: pte_offset_kernel(pmd, address)) -#if USE_SPLIT_PMD_PTLOCKS - -static struct page *pmd_to_page(pmd_t *pmd) -{ - unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1); - return virt_to_page((void *)((unsigned long) pmd & mask)); -} - -static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) -{ - return ptlock_ptr(pmd_to_page(pmd)); -} - -static inline bool pmd_ptlock_init(struct page *page) -{ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - page->pmd_huge_pte = NULL; -#endif - return ptlock_init(page); -} - -static inline void pmd_ptlock_free(struct page *page) -{ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - VM_BUG_ON_PAGE(page->pmd_huge_pte, page); -#endif - ptlock_free(page); -} - -#define pmd_huge_pte(mm, pmd) (pmd_to_page(pmd)->pmd_huge_pte) - -#else - -static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) -{ - return &mm->page_table_lock; -} - -static inline bool pmd_ptlock_init(struct page *page) { return true; } -static inline void pmd_ptlock_free(struct page *page) {} - -#define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte) - -#endif - -static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd) -{ - spinlock_t *ptl = pmd_lockptr(mm, pmd); - spin_lock(ptl); - return ptl; -} - static inline bool pgtable_pmd_page_ctor(struct page *page) { if (!pmd_ptlock_init(page)) @@ -2465,25 +2335,6 @@ static inline void pgtable_pmd_page_dtor(struct page *page) dec_lruvec_page_state(page, NR_PAGETABLE); } -/* - * No scalability reason to split PUD locks yet, but follow the same pattern - * as the PMD locks to make it easier if we decide to. The VM should not be - * considered ready to switch to split PUD locks yet; there may be places - * which need to be converted from page_table_lock. - */ -static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud) -{ - return &mm->page_table_lock; -} - -static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud) -{ - spinlock_t *ptl = pud_lockptr(mm, pud); - - spin_lock(ptl); - return ptl; -} - extern void __init pagecache_init(void); extern void __init free_area_init_memoryless_node(int nid); extern void free_initmem(void); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e24d2c992b11..c8f045705c1e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -84,6 +84,141 @@ static inline unsigned long pud_index(unsigned long address) #define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) #endif +#if USE_SPLIT_PTE_PTLOCKS +#if ALLOC_SPLIT_PTLOCKS +void __init ptlock_cache_init(void); +extern bool ptlock_alloc(struct page *page); +extern void ptlock_free(struct page *page); + +static inline spinlock_t *ptlock_ptr(struct page *page) +{ + return page->ptl; +} +#else /* ALLOC_SPLIT_PTLOCKS */ +static inline void ptlock_cache_init(void) +{ +} + +static inline bool ptlock_alloc(struct page *page) +{ + return true; +} + +static inline void ptlock_free(struct page *page) +{ +} + +static inline spinlock_t *ptlock_ptr(struct page *page) +{ + return &page->ptl; +} +#endif /* ALLOC_SPLIT_PTLOCKS */ + +static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) +{ + return ptlock_ptr(pmd_page(*pmd)); +} + +static inline bool ptlock_init(struct page *page) +{ + /* + * prep_new_page() initialize page->private (and therefore page->ptl) + * with 0. Make sure nobody took it in use in between. + * + * It can happen if arch try to use slab for page table allocation: + * slab code uses page->slab_cache, which share storage with page->ptl. + */ + VM_BUG_ON_PAGE(*(unsigned long *)&page->ptl, page); + if (!ptlock_alloc(page)) + return false; + spin_lock_init(ptlock_ptr(page)); + return true; +} + +#else /* !USE_SPLIT_PTE_PTLOCKS */ +/* + * We use mm->page_table_lock to guard all pagetable pages of the mm. + */ +static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) +{ + return &mm->page_table_lock; +} +static inline void ptlock_cache_init(void) {} +static inline bool ptlock_init(struct page *page) { return true; } +static inline void ptlock_free(struct page *page) {} +#endif /* USE_SPLIT_PTE_PTLOCKS */ + +#if USE_SPLIT_PMD_PTLOCKS + +static struct page *pmd_to_page(pmd_t *pmd) +{ + unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1); + return virt_to_page((void *)((unsigned long) pmd & mask)); +} + +static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) +{ + return ptlock_ptr(pmd_to_page(pmd)); +} + +static inline bool pmd_ptlock_init(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + page->pmd_huge_pte = NULL; +#endif + return ptlock_init(page); +} + +static inline void pmd_ptlock_free(struct page *page) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + VM_BUG_ON_PAGE(page->pmd_huge_pte, page); +#endif + ptlock_free(page); +} + +#define pmd_huge_pte(mm, pmd) (pmd_to_page(pmd)->pmd_huge_pte) + +#else + +static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd) +{ + return &mm->page_table_lock; +} + +static inline bool pmd_ptlock_init(struct page *page) { return true; } +static inline void pmd_ptlock_free(struct page *page) {} + +#define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte) + +#endif + +static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd) +{ + spinlock_t *ptl = pmd_lockptr(mm, pmd); + spin_lock(ptl); + return ptl; +} + +/* + * No scalability reason to split PUD locks yet, but follow the same pattern + * as the PMD locks to make it easier if we decide to. The VM should not be + * considered ready to switch to split PUD locks yet; there may be places + * which need to be converted from page_table_lock. + */ +static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud) +{ + return &mm->page_table_lock; +} + +static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud) +{ + spinlock_t *ptl = pud_lockptr(mm, pud); + + spin_lock(ptl); + return ptl; +} + #ifndef pte_offset_kernel static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) { @@ -102,6 +237,20 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) #define pte_unmap(pte) ((void)(pte)) /* NOP */ #endif +#define pte_offset_map_lock(mm, pmd, address, ptlp) \ +({ \ + spinlock_t *__ptl = pte_lockptr(mm, pmd); \ + pte_t *__pte = pte_offset_map(pmd, address); \ + *(ptlp) = __ptl; \ + spin_lock(__ptl); \ + __pte; \ +}) + +#define pte_unmap_unlock(pte, ptl) do { \ + spin_unlock(ptl); \ + pte_unmap(pte); \ +} while (0) + /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) From patchwork Wed Nov 10 08:40:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E071C433FE for ; Wed, 10 Nov 2021 08:41:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 29A9C61248 for ; Wed, 10 Nov 2021 08:41:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 29A9C61248 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id BC4826B0074; Wed, 10 Nov 2021 03:41:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B731C6B0075; Wed, 10 Nov 2021 03:41:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A141D6B007B; Wed, 10 Nov 2021 03:41:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0060.hostedemail.com [216.40.44.60]) by kanga.kvack.org (Postfix) with ESMTP id 9391E6B0074 for ; Wed, 10 Nov 2021 03:41:53 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 575C17D8C6 for ; Wed, 10 Nov 2021 08:41:53 +0000 (UTC) X-FDA: 78792377706.07.1B5C7B3 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf31.hostedemail.com (Postfix) with ESMTP id 7E1FE104EECC for ; Wed, 10 Nov 2021 08:41:40 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id b11so2349651pld.12 for ; Wed, 10 Nov 2021 00:41:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Zr0rEdADAEaJn1N2dbntb6SW/6iRFYHNNestGTjHAUw=; b=ubGR+OzFQ1TuYh0c6CA+SlJu3HO0Uah7CnGKzJCfJu6uWyLALl57f74EMNw9FiVeff CECiPlLygsN1QKh3rW+QCRe3UWyfzSQnk6/ek9aO+rJbhaurLSzt87t7wit714RzlXTe TVTZFUEn6AoWADenerWODZseHbsWy26B26IGj1PneiDRSmVfPNjHF4XlW7A2zZMQd9yQ c6K1Vyo0SOdS2IhZo1JcIlyaIQq15WH978qktCMNP4oaOM7sn+0fYLtLEJGFvUwWafnI O2wYFfuhal6E36gjjKFWMUzwu8VnPQgb7kccubvTpMpjEMUBOn4vA3eh07VpmOe0FGB1 OZJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Zr0rEdADAEaJn1N2dbntb6SW/6iRFYHNNestGTjHAUw=; b=WlnwvGzufoGs/PuLv9ih0TnmRr/g3naNKXbv2H3luTuqlFt4PlOyJHG61SY4yBXXAK NIJ8DnfcR/LHR5PESab5VXI9IKKJxxI6rxSh+6aYBoOux0iIJqw7ERZnfFfC0FAyTy0T HqN48tr2EhJ1sjijD24GcjEcpALNU6skv7i0hyNiDKeE1mJgAo/WERlRdKsm6cXh4PXp pEhcHd8yrQSOloCXM5ckHvCOMxX3W4Qoz5TZ9w3s2HYhZMkyAPGfqErQyF5F2i4u1M7i R4rzgFk5QvvnMPFCo1jzsyWyhNnmxQBrFRSWK2YZMgVZPYgVQ7ukmvpou6OGJoM6VwKf Nq2A== X-Gm-Message-State: AOAM533NsG7h2PNaYzoxx/IpK2OMZN5Ce3oyfDz7MKpO2RJnU73Uz5SE zQf90SbP6Q6w9TCBQe8Yu7mmfA== X-Google-Smtp-Source: ABdhPJzko2PpH9xf7ZWwbrHIa+6g5wVf4MnCYY4L1qk7EoSuHUGvbcCC2UzIffZfGim9pxrN9eZawA== X-Received: by 2002:a17:90a:6a82:: with SMTP id u2mr15066090pjj.105.1636533712190; Wed, 10 Nov 2021 00:41:52 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.41.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:41:51 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 04/15] mm: rework the parameter of lock_page_or_retry() Date: Wed, 10 Nov 2021 16:40:46 +0800 Message-Id: <20211110084057.27676-5-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 7E1FE104EECC X-Stat-Signature: tto737ow84pah9wgb5ugtkakn384cstd Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=ubGR+OzF; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf31.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com X-HE-Tag: 1636533700-465498 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We need the vmf in lock_page_or_retry() in the subsequent patch, so pass in it directly. Signed-off-by: Qi Zheng --- include/linux/pagemap.h | 8 +++----- mm/filemap.c | 6 ++++-- mm/memory.c | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 6a30916b76e5..94f9547b4411 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -709,8 +709,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page, void __folio_lock(struct folio *folio); int __folio_lock_killable(struct folio *folio); -bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, - unsigned int flags); +bool __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf); void unlock_page(struct page *page); void folio_unlock(struct folio *folio); @@ -772,14 +771,13 @@ static inline int lock_page_killable(struct page *page) * Return value and mmap_lock implications depend on flags; see * __folio_lock_or_retry(). */ -static inline bool lock_page_or_retry(struct page *page, struct mm_struct *mm, - unsigned int flags) +static inline bool lock_page_or_retry(struct page *page, struct vm_fault *vmf) { struct folio *folio; might_sleep(); folio = page_folio(page); - return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags); + return folio_trylock(folio) || __folio_lock_or_retry(folio, vmf); } /* diff --git a/mm/filemap.c b/mm/filemap.c index 07c654202870..ff8d19b7ce1d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1695,9 +1695,11 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) * If neither ALLOW_RETRY nor KILLABLE are set, will always return true * with the folio locked and the mmap_lock unperturbed. */ -bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, - unsigned int flags) +bool __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf) { + unsigned int flags = vmf->flags; + struct mm_struct *mm = vmf->vma->vm_mm; + if (fault_flag_allow_retry_first(flags)) { /* * CAUTION! In this case, mmap_lock is not released diff --git a/mm/memory.c b/mm/memory.c index b00cd60fc368..bec6a5d5ee7c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3443,7 +3443,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; struct mmu_notifier_range range; - if (!lock_page_or_retry(page, vma->vm_mm, vmf->flags)) + if (!lock_page_or_retry(page, vmf)) return VM_FAULT_RETRY; mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, vma, vma->vm_mm, vmf->address & PAGE_MASK, @@ -3576,7 +3576,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_release; } - locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + locked = lock_page_or_retry(page, vmf); delayacct_clear_flag(current, DELAYACCT_PF_SWAPIN); if (!locked) { From patchwork Wed Nov 10 08:40:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 086DDC433F5 for ; Wed, 10 Nov 2021 08:42:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AA30161248 for ; Wed, 10 Nov 2021 08:41:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AA30161248 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4E5786B007B; Wed, 10 Nov 2021 03:41:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 495226B007D; Wed, 10 Nov 2021 03:41:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30F736B007E; Wed, 10 Nov 2021 03:41:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0076.hostedemail.com [216.40.44.76]) by kanga.kvack.org (Postfix) with ESMTP id 21ADC6B007B for ; Wed, 10 Nov 2021 03:41:59 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D9BB37CB5C for ; Wed, 10 Nov 2021 08:41:58 +0000 (UTC) X-FDA: 78792377916.22.058E8FA Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) by imf21.hostedemail.com (Postfix) with ESMTP id D7425D036A53 for ; Wed, 10 Nov 2021 08:41:52 +0000 (UTC) Received: by mail-pg1-f182.google.com with SMTP id b4so1642027pgh.10 for ; Wed, 10 Nov 2021 00:41:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4diZW2wEVqXk2ndEsmrS3nCtcwwWuUKIRQnKQFgeem0=; b=1Stuuz7uV04kUBZcIFrURXvAmFKLdyTFq2Qo9nhAutozrzga8tmrgisOMYgRYb4mUu ZTQbhP2EJReHSVG5WShcm/1GMtwY1WfiTHUfcPaijhtBJOhMZ5kjQYTq98Vx/YsKuyAp cdPzMLtolmyDo4f1yqWfl88cERz4Akm/h8SM8Yh/+cVS44y6AvyQWYEPpoDJSwR5Rtba 1WUio4XMUDdapW8zNzcY0QAOaRX223a9k00f5a0/saoIIVXZj9Xym3ogFXBxhZ+4WdmK 8Sd5O47U+CYPXrJWzrWwUD6FopAhz0aopwJCZYug3uMvWZ9/MlzoOsZWRCxL+fqmmVGd dP1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4diZW2wEVqXk2ndEsmrS3nCtcwwWuUKIRQnKQFgeem0=; b=NcmMM4WtnmhK3J8LK1Di04PTw4WPkkaHR2rUkM6dNWZZBBqESL42NUAQmQHG5EGXK9 RAmWVEjg7PoiHxUF/KtMokia0SDMNI59sGMMlorRF8XbcBBO5eomfrjkJ2uK7kwwJWA6 +8u5SX4X8sbXYL7RW4L1o+SzcDIv5EQSew6ve4O9vebWB/EKo6w6FUlnY6+bgGU9+m6A U+G63FmFvw6uMRpAX7Q9ABirCAn8DLezM5OMp6mFur63V+ch0AYJe+5ffqdROmmHbk8l HeRQ0dexEYLpTLLOObYbOPNVWik/ee7SXZxGvi37p18SlRkAHV1x2eY/pp9h65tzo6S+ O+oQ== X-Gm-Message-State: AOAM531QyICBGnto2oBuer2c7pvDQaCvKJLDKiiKmdZoz1TxLjvHEykz 5iHxfBGJ/Ldb8+XKb9a6kYhSqQ== X-Google-Smtp-Source: ABdhPJzntNpu2M7wuMLvGamh9aphirajfOvXDPPm4M37bdRqFehQ8H419Ia1E+UAFSRsCOwvPXzDGA== X-Received: by 2002:a05:6a00:8cd:b0:47b:b9e8:7c2e with SMTP id s13-20020a056a0008cd00b0047bb9e87c2emr95828172pfu.61.1636533717630; Wed, 10 Nov 2021 00:41:57 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.41.52 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:41:57 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 05/15] mm: add pmd_installed_type return for __pte_alloc() and other friends Date: Wed, 10 Nov 2021 16:40:47 +0800 Message-Id: <20211110084057.27676-6-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D7425D036A53 X-Stat-Signature: 1466713nmyxa1uckyszfugzokafrpcxq Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=1Stuuz7u; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.182 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com X-HE-Tag: 1636533712-612043 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When we call __pte_alloc() or other friends, a huge pmd might be created from a different thread. This is why pmd_trans_unstable() will now be called after __pte_alloc() or other friends return. This patch add pmd_installed_type return for __pte_alloc() and other friends, then we can check the huge pmd through the return value instead of calling pmd_trans_unstable() again. This patch has no functional change, just some preparations for the future patches. Signed-off-by: Qi Zheng --- include/linux/mm.h | 20 +++++++++++++++++--- mm/debug_vm_pgtable.c | 2 +- mm/filemap.c | 11 +++++++---- mm/gup.c | 2 +- mm/internal.h | 3 ++- mm/memory.c | 39 ++++++++++++++++++++++++++------------- mm/migrate.c | 17 ++--------------- mm/mremap.c | 2 +- mm/userfaultfd.c | 24 +++++++++++++++--------- 9 files changed, 72 insertions(+), 48 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 706da081b9f8..52f36fde2f11 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2306,13 +2306,27 @@ static inline void pgtable_pte_page_dtor(struct page *page) dec_lruvec_page_state(page, NR_PAGETABLE); } -#define pte_alloc(mm, pmd) (unlikely(pmd_none(*(pmd))) && __pte_alloc(mm, pmd)) +enum pmd_installed_type { + INSTALLED_PTE, + INSTALLED_HUGE_PMD, +}; + +static inline int pte_alloc(struct mm_struct *mm, pmd_t *pmd) +{ + if (unlikely(pmd_none(*(pmd)))) + return __pte_alloc(mm, pmd); + if (unlikely(is_huge_pmd(*pmd))) + return INSTALLED_HUGE_PMD; + + return INSTALLED_PTE; +} +#define pte_alloc pte_alloc #define pte_alloc_map(mm, pmd, address) \ - (pte_alloc(mm, pmd) ? NULL : pte_offset_map(pmd, address)) + (pte_alloc(mm, pmd) < 0 ? NULL : pte_offset_map(pmd, address)) #define pte_alloc_map_lock(mm, pmd, address, ptlp) \ - (pte_alloc(mm, pmd) ? \ + (pte_alloc(mm, pmd) < 0 ? \ NULL : pte_offset_map_lock(mm, pmd, address, ptlp)) #define pte_alloc_kernel(pmd, address) \ diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 228e3954b90c..b8322c55e65d 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -1170,7 +1170,7 @@ static int __init init_args(struct pgtable_debug_args *args) args->start_pmdp = pmd_offset(args->pudp, 0UL); WARN_ON(!args->start_pmdp); - if (pte_alloc(args->mm, args->pmdp)) { + if (pte_alloc(args->mm, args->pmdp) < 0) { pr_err("Failed to allocate pte entries\n"); ret = -ENOMEM; goto error; diff --git a/mm/filemap.c b/mm/filemap.c index ff8d19b7ce1d..23363f8ddbbe 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3217,12 +3217,15 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page) } } - if (pmd_none(*vmf->pmd)) - pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); + if (pmd_none(*vmf->pmd)) { + int ret = pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); - /* See comment in handle_pte_fault() */ - if (pmd_devmap_trans_unstable(vmf->pmd)) + if (unlikely(ret == INSTALLED_HUGE_PMD)) + goto out; + } else if (pmd_devmap_trans_unstable(vmf->pmd)) { + /* See comment in handle_pte_fault() */ goto out; + } return false; diff --git a/mm/gup.c b/mm/gup.c index 2c51e9748a6a..2def775232a3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -699,7 +699,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, } else { spin_unlock(ptl); split_huge_pmd(vma, pmd, address); - ret = pte_alloc(mm, pmd) ? -ENOMEM : 0; + ret = pte_alloc(mm, pmd) < 0 ? -ENOMEM : 0; } return ret ? ERR_PTR(ret) : diff --git a/mm/internal.h b/mm/internal.h index 3b79a5c9427a..474d6e3443f8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -67,7 +67,8 @@ bool __folio_end_writeback(struct folio *folio); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); -void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte); +enum pmd_installed_type pmd_install(struct mm_struct *mm, pmd_t *pmd, + pgtable_t *pte); static inline bool can_madv_lru_vma(struct vm_area_struct *vma) { diff --git a/mm/memory.c b/mm/memory.c index bec6a5d5ee7c..8a39c0e58324 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -437,8 +437,10 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, } } -void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) +enum pmd_installed_type pmd_install(struct mm_struct *mm, pmd_t *pmd, + pgtable_t *pte) { + int ret = INSTALLED_PTE; spinlock_t *ptl = pmd_lock(mm, pmd); if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ @@ -459,20 +461,26 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ pmd_populate(mm, pmd, *pte); *pte = NULL; + } else if (is_huge_pmd(*pmd)) { + /* See comment in handle_pte_fault() */ + ret = INSTALLED_HUGE_PMD; } spin_unlock(ptl); + + return ret; } int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) { + enum pmd_installed_type ret; pgtable_t new = pte_alloc_one(mm); if (!new) return -ENOMEM; - pmd_install(mm, pmd, &new); + ret = pmd_install(mm, pmd, &new); if (new) pte_free(mm, new); - return 0; + return ret; } int __pte_alloc_kernel(pmd_t *pmd) @@ -1813,7 +1821,7 @@ static int insert_pages(struct vm_area_struct *vma, unsigned long addr, /* Allocate the PTE if necessary; takes PMD lock once only. */ ret = -ENOMEM; - if (pte_alloc(mm, pmd)) + if (pte_alloc(mm, pmd) < 0) goto out; while (pages_to_write_in_pmd) { @@ -3713,6 +3721,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) struct page *page; vm_fault_t ret = 0; pte_t entry; + int alloc_ret; /* File mapping without ->vm_ops ? */ if (vma->vm_flags & VM_SHARED) @@ -3728,11 +3737,11 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) * * Here we only have mmap_read_lock(mm). */ - if (pte_alloc(vma->vm_mm, vmf->pmd)) + alloc_ret = pte_alloc(vma->vm_mm, vmf->pmd); + if (alloc_ret < 0) return VM_FAULT_OOM; - /* See comment in handle_pte_fault() */ - if (unlikely(pmd_trans_unstable(vmf->pmd))) + if (unlikely(alloc_ret == INSTALLED_HUGE_PMD)) return 0; /* Use the zero-page for reads */ @@ -4023,6 +4032,8 @@ vm_fault_t finish_fault(struct vm_fault *vmf) } if (pmd_none(*vmf->pmd)) { + int alloc_ret; + if (PageTransCompound(page)) { ret = do_set_pmd(vmf, page); if (ret != VM_FAULT_FALLBACK) @@ -4030,14 +4041,16 @@ vm_fault_t finish_fault(struct vm_fault *vmf) } if (vmf->prealloc_pte) - pmd_install(vma->vm_mm, vmf->pmd, &vmf->prealloc_pte); - else if (unlikely(pte_alloc(vma->vm_mm, vmf->pmd))) - return VM_FAULT_OOM; - } + alloc_ret = pmd_install(vma->vm_mm, vmf->pmd, &vmf->prealloc_pte); + else + alloc_ret = pte_alloc(vma->vm_mm, vmf->pmd); - /* See comment in handle_pte_fault() */ - if (pmd_devmap_trans_unstable(vmf->pmd)) + if (unlikely(alloc_ret != INSTALLED_PTE)) + return alloc_ret < 0 ? VM_FAULT_OOM : 0; + } else if (pmd_devmap_trans_unstable(vmf->pmd)) { + /* See comment in handle_pte_fault() */ return 0; + } vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); diff --git a/mm/migrate.c b/mm/migrate.c index cf25b00f03c8..bdfdfd3b50be 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2731,21 +2731,8 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) goto abort; - /* - * Use pte_alloc() instead of pte_alloc_map(). We can't run - * pte_offset_map() on pmds where a huge pmd might be created - * from a different thread. - * - * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when - * parallel threads are excluded by other means. - * - * Here we only have mmap_read_lock(mm). - */ - if (pte_alloc(mm, pmdp)) - goto abort; - - /* See the comment in pte_alloc_one_map() */ - if (unlikely(pmd_trans_unstable(pmdp))) + /* See the comment in do_anonymous_page() */ + if (unlikely(pte_alloc(mm, pmdp) != INSTALLED_PTE)) goto abort; if (unlikely(anon_vma_prepare(vma))) diff --git a/mm/mremap.c b/mm/mremap.c index c6e9da09dd0a..fc5c56858883 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -551,7 +551,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, continue; } - if (pte_alloc(new_vma->vm_mm, new_pmd)) + if (pte_alloc(new_vma->vm_mm, new_pmd) < 0) break; move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma, new_pmd, new_addr, need_rmap_locks); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 0780c2a57ff1..2cea08e7f076 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -592,15 +592,21 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, err = -EEXIST; break; } - if (unlikely(pmd_none(dst_pmdval)) && - unlikely(__pte_alloc(dst_mm, dst_pmd))) { - err = -ENOMEM; - break; - } - /* If an huge pmd materialized from under us fail */ - if (unlikely(pmd_trans_huge(*dst_pmd))) { - err = -EFAULT; - break; + + if (unlikely(pmd_none(dst_pmdval))) { + int ret = __pte_alloc(dst_mm, dst_pmd); + + /* + * If there is not enough memory or an huge pmd + * materialized from under us + */ + if (unlikely(ret < 0)) { + err = -ENOMEM; + break; + } else if (unlikely(ret == INSTALLED_HUGE_PMD)) { + err = -EFAULT; + break; + } } BUG_ON(pmd_none(*dst_pmd)); From patchwork Wed Nov 10 08:40:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AA73C433F5 for ; Wed, 10 Nov 2021 08:42:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AB68761248 for ; Wed, 10 Nov 2021 08:42:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AB68761248 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4BF046B007E; Wed, 10 Nov 2021 03:42:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 46E866B0080; Wed, 10 Nov 2021 03:42:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30EA36B0081; Wed, 10 Nov 2021 03:42:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0213.hostedemail.com [216.40.44.213]) by kanga.kvack.org (Postfix) with ESMTP id 213186B007E for ; Wed, 10 Nov 2021 03:42:06 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D112A824C45A for ; Wed, 10 Nov 2021 08:42:05 +0000 (UTC) X-FDA: 78792378210.21.8E74E83 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf28.hostedemail.com (Postfix) with ESMTP id 671CE90000B0 for ; Wed, 10 Nov 2021 08:42:05 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id y1so2375769plk.10 for ; Wed, 10 Nov 2021 00:42:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=raVLPH1/MWxE4zs2FkbLjPSd/7tvVQ5+mBIXosP/gF0=; b=1rDJFDXpLh6rSmlfNKFmUYnJcwGRZ/tGj/0e2O9RSPT7ShFjnTF4E3U9P9TvQYsnHd OVunD8eQWZR57PQHWUnOO/VUHSTWpicDzeIx/WOvtP0fMojKPRwaTuyQqnr6bH92f5lx aRQxIFjof2Tj1itD9USI+JwaD7oZhYirQBHOyhH0ZiQv/U7PhU9EDeKJELK0ftt4a0BU Rmm828cvwKy1G0pjqFfchPh/6yXnE2oKqjxPaBBok2qT/n6GiibR9e46gCapYbmr6YBG eLoiTJrBZE+LnrgXxMFGjkFQQe2ur/+A0ECQgzti5oECuw/1hwZcdJG76oZAFVSY9hjL cZDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=raVLPH1/MWxE4zs2FkbLjPSd/7tvVQ5+mBIXosP/gF0=; b=7qpDj5gAdMxgDBKXLOnWBoMRWoNdeQ5BPN54ufe/Gxf+y7GdW7QI4O5Z7F1cv1ANTO vgwkRKi0RTWu1iPDM3JtvujCCNaeUoLt5UJWXIvP+shofocTp1eNE9UFohkYnTCDBhP9 13JF0sYQ9VAv8uaFgKtSv2EBm9Qn0Y/5+c718kEoYdMriTkarHizElvjPvhLVyg+LQHf EnL3bLjw1LQyaNoShQtlEpPmgqOcU5x5308KTFjInWaU4/bymGLNNAz7RAa0AGuMKrTw WWBE/Xu5v0uz5HGzsTLJAumVTRrxpatCN9v+VUb3uNJSeOlCEGsQknNXdA8Tnb0TC0eA gdDQ== X-Gm-Message-State: AOAM533qoBnyDKsrdTQWOSXeTvqsCQaI9k66UTTrMF2NDxFfekvf3UjX h5WVaSAfVBKAkU4HTYPlATM2Tw== X-Google-Smtp-Source: ABdhPJwaauMvRloE2G2hZLmRcyFOYHFu5sjrScnEyzdsRrn+Qh+8FxVQY5T1HBJLSfs2V04ez92ZVw== X-Received: by 2002:a17:90b:1644:: with SMTP id il4mr14758177pjb.39.1636533724441; Wed, 10 Nov 2021 00:42:04 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.41.58 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:42:04 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 06/15] mm: introduce refcount for user PTE page table page Date: Wed, 10 Nov 2021 16:40:48 +0800 Message-Id: <20211110084057.27676-7-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 671CE90000B0 X-Stat-Signature: 7uy7j4k34kcfijcu7h5ku7umthmzn714 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=1rDJFDXp; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf28.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com X-HE-Tag: 1636533725-37939 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 1. Preface ========== Now in order to pursue high performance, applications mostly use some high-performance user-mode memory allocators, such as jemalloc or tcmalloc. These memory allocators use madvise(MADV_DONTNEED or MADV_FREE) to release physical memory for the following reasons:: First of all, we should hold as few write locks of mmap_lock as possible, since the mmap_lock semaphore has long been a contention point in the memory management subsystem. The mmap()/munmap() hold the write lock, and the madvise(MADV_DONTNEED or MADV_FREE) hold the read lock, so using madvise() instead of munmap() to released physical memory can reduce the competition of the mmap_lock. Secondly, after using madvise() to release physical memory, there is no need to build vma and allocate page tables again when accessing the same virtual address again, which can also save some time. The following is the largest user PTE page table memory that can be allocated by a single user process in a 32-bit and a 64-bit system. +---------------------------+--------+---------+ | | 32-bit | 64-bit | +===========================+========+=========+ | user PTE page table pages | 3 MiB | 512 GiB | +---------------------------+--------+---------+ | user PMD page table pages | 3 KiB | 1 GiB | +---------------------------+--------+---------+ (for 32-bit, take 3G user address space, 4K page size as an example; for 64-bit, take 48-bit address width, 4K page size as an example.) After using madvise(), everything looks good, but as can be seen from the above table, a single process can create a large number of PTE page tables on a 64-bit system, since both of the MADV_DONTNEED and MADV_FREE will not release page table memory. And before the process exits or calls munmap(), the kernel cannot reclaim these pages even if these PTE page tables do not map anything. Therefore, we decided to introduce reference count to manage the PTE page table life cycle, so that some free PTE page table memory in the system can be dynamically released. 2. The reference count of user PTE page table pages =================================================== We introduce two members for the struct page of the user PTE page table page:: union { pgtable_t pmd_huge_pte; /* protected by page->ptl */ pmd_t *pmd; /* PTE page only */ }; union { struct mm_struct *pt_mm; /* x86 pgds only */ atomic_t pt_frag_refcount; /* powerpc */ atomic_t pte_refcount; /* PTE page only */ }; The pmd member record the pmd entry that maps the user PTE page table page, the pte_refcount member keep track of how many references to the user PTE page table page. The following people will hold a reference on the user PTE page table page:: The !pte_none() entry, such as regular page table entry that map physical pages, or swap entry, or migrate entry, etc. Visitor to the PTE page table entries, such as page table walker. Any ``!pte_none()`` entry and visitor can be regarded as the user of its PTE page table page. When the ``pte_refcount`` is reduced to 0, it means that no one is using the PTE page table page, then this free PTE page table page can be released back to the system at this time. 3. Helpers ========== +---------------------+-------------------------------------------------+ | pte_ref_init | Initialize the pte_refcount and pmd | +---------------------+-------------------------------------------------+ | pte_to_pmd | Get the corresponding pmd | +---------------------+-------------------------------------------------+ | pte_update_pmd | Update the corresponding pmd | +---------------------+-------------------------------------------------+ | pte_get | Increment a pte_refcount | +---------------------+-------------------------------------------------+ | pte_get_many | Add a value to a pte_refcount | +---------------------+-------------------------------------------------+ | pte_get_unless_zero | Increment a pte_refcount unless it is 0 | +---------------------+-------------------------------------------------+ | pte_try_get | Try to increment a pte_refcount | +---------------------+-------------------------------------------------+ | pte_tryget_map | Try to increment a pte_refcount before | | | pte_offset_map() | +---------------------+-------------------------------------------------+ | pte_tryget_map_lock | Try to increment a pte_refcount before | | | pte_offset_map_lock() | +---------------------+-------------------------------------------------+ | pte_put | Decrement a pte_refcount | +---------------------+-------------------------------------------------+ | pte_put_many | Sub a value to a pte_refcount | +---------------------+-------------------------------------------------+ | pte_put_vmf | Decrement a pte_refcount in the page fault path | +---------------------+-------------------------------------------------+ 4. About this commit ==================== This commit just introduces some dummy helpers, the actual logic will be implemented in future commits. Signed-off-by: Qi Zheng --- include/linux/mm_types.h | 6 +++- include/linux/pte_ref.h | 87 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/Makefile | 4 +-- mm/pte_ref.c | 55 ++++++++++++++++++++++++++++++ 4 files changed, 149 insertions(+), 3 deletions(-) create mode 100644 include/linux/pte_ref.h create mode 100644 mm/pte_ref.c diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index bb8c6f5f19bc..c599008d54fe 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -149,11 +149,15 @@ struct page { }; struct { /* Page table pages */ unsigned long _pt_pad_1; /* compound_head */ - pgtable_t pmd_huge_pte; /* protected by page->ptl */ + union { + pgtable_t pmd_huge_pte; /* protected by page->ptl */ + pmd_t *pmd; /* PTE page only */ + }; unsigned long _pt_pad_2; /* mapping */ union { struct mm_struct *pt_mm; /* x86 pgds only */ atomic_t pt_frag_refcount; /* powerpc */ + atomic_t pte_refcount; /* PTE page only */ }; #if ALLOC_SPLIT_PTLOCKS spinlock_t *ptl; diff --git a/include/linux/pte_ref.h b/include/linux/pte_ref.h new file mode 100644 index 000000000000..b6d8335bdc59 --- /dev/null +++ b/include/linux/pte_ref.h @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2021, ByteDance. All rights reserved. + * + * Author: Qi Zheng + */ +#ifndef _LINUX_PTE_REF_H +#define _LINUX_PTE_REF_H + +#include + +enum pte_tryget_type { + TRYGET_SUCCESSED, + TRYGET_FAILED_ZERO, + TRYGET_FAILED_NONE, + TRYGET_FAILED_HUGE_PMD, +}; + +bool pte_get_unless_zero(pmd_t *pmd); +enum pte_tryget_type pte_try_get(pmd_t *pmd); +void pte_put_vmf(struct vm_fault *vmf); + +static inline void pte_ref_init(pgtable_t pte, pmd_t *pmd, int count) +{ +} + +static inline pmd_t *pte_to_pmd(pte_t *pte) +{ + return NULL; +} + +static inline void pte_update_pmd(pmd_t old_pmd, pmd_t *new_pmd) +{ +} + +static inline void pte_get_many(pmd_t *pmd, unsigned int nr) +{ +} + +/* + * pte_get - Increment refcount for the PTE page table. + * @pmd: a pointer to the pmd entry corresponding to the PTE page table. + * + * Similar to the mechanism of page refcount, the user of PTE page table + * should hold a refcount to it before accessing. + */ +static inline void pte_get(pmd_t *pmd) +{ + pte_get_many(pmd, 1); +} + +static inline pte_t *pte_tryget_map(pmd_t *pmd, unsigned long address) +{ + if (pte_try_get(pmd)) + return NULL; + + return pte_offset_map(pmd, address); +} + +static inline pte_t *pte_tryget_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long address, spinlock_t **ptlp) +{ + if (pte_try_get(pmd)) + return NULL; + + return pte_offset_map_lock(mm, pmd, address, ptlp); +} + +static inline void pte_put_many(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, unsigned int nr) +{ +} + +/* + * pte_put - Decrement refcount for the PTE page table. + * @mm: the mm_struct of the target address space. + * @pmd: a pointer to the pmd entry corresponding to the PTE page table. + * @addr: the start address of the tlb range to be flushed. + * + * The PTE page table page will be freed when the last refcount is dropped. + */ +static inline void pte_put(struct mm_struct *mm, pmd_t *pmd, unsigned long addr) +{ + pte_put_many(mm, pmd, addr, 1); +} + +#endif diff --git a/mm/Makefile b/mm/Makefile index d6c0042e3aa0..ea679bf75a5f 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -38,8 +38,8 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \ - pgtable-generic.o rmap.o vmalloc.o - + pgtable-generic.o rmap.o vmalloc.o \ + pte_ref.o ifdef CONFIG_CROSS_MEMORY_ATTACH mmu-$(CONFIG_MMU) += process_vm_access.o diff --git a/mm/pte_ref.c b/mm/pte_ref.c new file mode 100644 index 000000000000..de109905bc8f --- /dev/null +++ b/mm/pte_ref.c @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2021, ByteDance. All rights reserved. + * + * Author: Qi Zheng + */ + +#include +#include + +/* + * pte_get_unless_zero - Increment refcount for the PTE page table + * unless it is zero. + * @pmd: a pointer to the pmd entry corresponding to the PTE page table. + */ +bool pte_get_unless_zero(pmd_t *pmd) +{ + return true; +} + +/* + * pte_try_get - Try to increment refcount for the PTE page table. + * @pmd: a pointer to the pmd entry corresponding to the PTE page table. + * + * Return true if the increment succeeded. Otherwise return false. + * + * Before Operating the PTE page table, we need to hold a refcount + * to protect against the concurrent release of the PTE page table. + * But we will fail in the following case: + * - The content mapped in @pmd is not a PTE page + * - The refcount of the PTE page table is zero, it will be freed + */ +enum pte_tryget_type pte_try_get(pmd_t *pmd) +{ + if (unlikely(pmd_none(*pmd))) + return TRYGET_FAILED_NONE; + if (unlikely(is_huge_pmd(*pmd))) + return TRYGET_FAILED_HUGE_PMD; + + return TRYGET_SUCCESSED; +} + +/* + * pte_put_vmf - Decrement refcount for the PTE page table. + * @vmf: fault information + * + * The mmap_lock may be unlocked in advance in some cases + * in handle_pte_fault(), then the pmd entry will no longer + * be stable. For example, the corresponds of the PTE page may + * be replaced(e.g. mremap), so we should ensure the pte_put() + * is performed in the critical section of the mmap_lock. + */ +void pte_put_vmf(struct vm_fault *vmf) +{ +} From patchwork Wed Nov 10 08:40:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8648C433EF for ; Wed, 10 Nov 2021 08:42:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 66AB861076 for ; Wed, 10 Nov 2021 08:42:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 66AB861076 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 09FF96B0083; Wed, 10 Nov 2021 03:42:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0500D6B0082; Wed, 10 Nov 2021 03:42:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E318B6B0083; Wed, 10 Nov 2021 03:42:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id D46476B0080 for ; Wed, 10 Nov 2021 03:42:11 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8D398184E9AE7 for ; Wed, 10 Nov 2021 08:42:11 +0000 (UTC) X-FDA: 78792378462.16.310BC8F Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by imf15.hostedemail.com (Postfix) with ESMTP id 24CC5D0000A4 for ; Wed, 10 Nov 2021 08:41:58 +0000 (UTC) Received: by mail-pg1-f171.google.com with SMTP id f5so1627127pgc.12 for ; Wed, 10 Nov 2021 00:42:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=98uC4cgov5u0+cSk5mxDVBWjtzB+qrevfuyFblbNyWw=; b=0N6slpPUhW1A4YRpkSlCmuNoIPO8WpAnr8OlLjfIXqvDQBADeuTlcz3A34b5tTyT4u m2BnT+TdePijYJ/NHACo6lGBqf2bhO7vxP4+FSDr5B/DJhVvKejtN6jx9uD+SIQgWGox acjebMz4woVmZ32Mejy9gD6SDt0Mmhkadb8xlQ7keTg8rI4ABu8ZXh5ssbhlyWkueKvP VWZphJphghZV8GfR4aavQbMwQv4IKtB5Du1s7AK8tRS4upKpn7Ex4Dw6HKt+y2H+Sicy OxbuqdEATq/1ZuGfRMg+41c0/UkuIVzkOYQ8iwa0XG/yv++ozRbBmg3SoO8utrw/KNYw FtRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=98uC4cgov5u0+cSk5mxDVBWjtzB+qrevfuyFblbNyWw=; b=vPiEhhJBPgA+fnnUAqw88kXt5nQ0kneihzHAndUdiAwYW4YJlYbjsRIgKaOAzxnfMF IJSNluZ3sEFZfO+zHfa0pJXBvm3wQiMcPgTSEirpz0MtPMl+FgkHdTlfxi+42Wm5NsUO NEd/ebBlnUoLCUAbV+AVIO9j7rXkcd8ui2dV4YJEo9/tfmf2lUrWGRfs5+aVnPXeK5Wz hUaSTz1LrrbkqLVVh3iVcvjhL6GnUrtqeuUdzfl1q2rJHmzrAZJjS3YrusPqDmxX0v/j g/hK6GkEKCrdtrkzg2cTJCMblzMB74HfyPPbUAb8znBMaI7fCQGK8KH3acVFfC+saSN0 A+Xg== X-Gm-Message-State: AOAM530J+ubjHIYzZHC0dHKTRLqqkL4+uyzhCI4e2f8lKG6Q+OOx89fV lNdiHg9taYQxkeYWQvVDfpiUjg== X-Google-Smtp-Source: ABdhPJxVQ+sRcO4A48bCD11t80yPVjjGxrBnszCRN/R/KGq4itP8KLuOqx0J61g6YuJMdidzD98trQ== X-Received: by 2002:aa7:9628:0:b0:494:6dc8:66de with SMTP id r8-20020aa79628000000b004946dc866demr47200158pfg.73.1636533730183; Wed, 10 Nov 2021 00:42:10 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.42.04 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:42:09 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 07/15] mm/pte_ref: add support for user PTE page table page allocation Date: Wed, 10 Nov 2021 16:40:49 +0800 Message-Id: <20211110084057.27676-8-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 24CC5D0000A4 X-Stat-Signature: cmrhqbwriquhryj6wykq8jxj1h1z175z Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=0N6slpPU; spf=pass (imf15.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1636533718-382530 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When the PTE page table page is allocated and installed into the pmd entry, it needs to take an initial reference count to prevent the release of PTE page table page by other threads, and the caller of pte_alloc()(or other friends) needs to reduce this reference count. Signed-off-by: Qi Zheng --- include/linux/mm.h | 7 +++++-- mm/debug_vm_pgtable.c | 1 + mm/filemap.c | 8 ++++++-- mm/gup.c | 10 +++++++--- mm/memory.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- mm/migrate.c | 9 ++++++--- mm/mlock.c | 1 + mm/mremap.c | 1 + mm/userfaultfd.c | 16 +++++++++++++++- 9 files changed, 83 insertions(+), 21 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 52f36fde2f11..753a9435e0d0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -2313,9 +2314,11 @@ enum pmd_installed_type { static inline int pte_alloc(struct mm_struct *mm, pmd_t *pmd) { - if (unlikely(pmd_none(*(pmd)))) + enum pte_tryget_type ret = pte_try_get(pmd); + + if (ret == TRYGET_FAILED_NONE || ret == TRYGET_FAILED_ZERO) return __pte_alloc(mm, pmd); - if (unlikely(is_huge_pmd(*pmd))) + else if (ret == TRYGET_FAILED_HUGE_PMD) return INSTALLED_HUGE_PMD; return INSTALLED_PTE; diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index b8322c55e65d..52f006654664 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -1048,6 +1048,7 @@ static void __init destroy_args(struct pgtable_debug_args *args) /* Free page table entries */ if (args->start_ptep) { + pte_put(args->mm, args->start_pmdp, args->vaddr); pte_free(args->mm, args->start_ptep); mm_dec_nr_ptes(args->mm); } diff --git a/mm/filemap.c b/mm/filemap.c index 23363f8ddbbe..1e7e9e4fd759 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3217,6 +3217,7 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page) } } +retry: if (pmd_none(*vmf->pmd)) { int ret = pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); @@ -3225,6 +3226,8 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page) } else if (pmd_devmap_trans_unstable(vmf->pmd)) { /* See comment in handle_pte_fault() */ goto out; + } else if (pte_try_get(vmf->pmd) == TRYGET_FAILED_ZERO) { + goto retry; } return false; @@ -3301,7 +3304,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, struct file *file = vma->vm_file; struct address_space *mapping = file->f_mapping; pgoff_t last_pgoff = start_pgoff; - unsigned long addr; + unsigned long addr, start; XA_STATE(xas, &mapping->i_pages, start_pgoff); struct page *head, *page; unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); @@ -3317,7 +3320,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, goto out; } - addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT); + start = addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); do { page = find_subpage(head, xas.xa_index); @@ -3348,6 +3351,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, put_page(head); } while ((head = next_map_page(mapping, &xas, end_pgoff)) != NULL); pte_unmap_unlock(vmf->pte, vmf->ptl); + pte_put(vma->vm_mm, vmf->pmd, start); out: rcu_read_unlock(); WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); diff --git a/mm/gup.c b/mm/gup.c index 2def775232a3..e084111103f0 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -694,7 +694,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spin_unlock(ptl); ret = 0; split_huge_pmd(vma, pmd, address); - if (pmd_trans_unstable(pmd)) + if (pte_try_get(pmd) == TRYGET_FAILED_HUGE_PMD) ret = -EBUSY; } else { spin_unlock(ptl); @@ -702,8 +702,12 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, ret = pte_alloc(mm, pmd) < 0 ? -ENOMEM : 0; } - return ret ? ERR_PTR(ret) : - follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + if (ret) + return ERR_PTR(ret); + + page = follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + pte_put(mm, pmd, address); + return page; } page = follow_trans_huge_pmd(vma, address, pmd, flags); spin_unlock(ptl); diff --git a/mm/memory.c b/mm/memory.c index 8a39c0e58324..0b9af38cfa11 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -441,10 +441,13 @@ enum pmd_installed_type pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) { int ret = INSTALLED_PTE; - spinlock_t *ptl = pmd_lock(mm, pmd); + spinlock_t *ptl; +retry: + ptl = pmd_lock(mm, pmd); if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ mm_inc_nr_ptes(mm); + pte_ref_init(*pte, pmd, 1); /* * Ensure all pte setup (eg. pte page lock and page clearing) are * visible before the pte is made visible to other CPUs by being @@ -464,6 +467,9 @@ enum pmd_installed_type pmd_install(struct mm_struct *mm, pmd_t *pmd, } else if (is_huge_pmd(*pmd)) { /* See comment in handle_pte_fault() */ ret = INSTALLED_HUGE_PMD; + } else if (!pte_get_unless_zero(pmd)) { + spin_unlock(ptl); + goto retry; } spin_unlock(ptl); @@ -1028,6 +1034,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, int rss[NR_MM_COUNTERS]; swp_entry_t entry = (swp_entry_t){0}; struct page *prealloc = NULL; + unsigned long start = addr; again: progress = 0; @@ -1108,6 +1115,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, pte_unmap(orig_src_pte); add_mm_rss_vec(dst_mm, rss); pte_unmap_unlock(orig_dst_pte, dst_ptl); + pte_put(dst_mm, dst_pmd, start); cond_resched(); if (ret == -EIO) { @@ -1778,6 +1786,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, goto out; retval = insert_page_into_pte_locked(mm, pte, addr, page, prot); pte_unmap_unlock(pte, ptl); + pte_put(mm, pte_to_pmd(pte), addr); out: return retval; } @@ -1810,6 +1819,7 @@ static int insert_pages(struct vm_area_struct *vma, unsigned long addr, unsigned long remaining_pages_total = *num; unsigned long pages_to_write_in_pmd; int ret; + unsigned long start = addr; more: ret = -EFAULT; pmd = walk_to_pmd(mm, addr); @@ -1836,7 +1846,7 @@ static int insert_pages(struct vm_area_struct *vma, unsigned long addr, pte_unmap_unlock(start_pte, pte_lock); ret = err; remaining_pages_total -= pte_idx; - goto out; + goto put; } addr += PAGE_SIZE; ++curr_page_idx; @@ -1845,9 +1855,13 @@ static int insert_pages(struct vm_area_struct *vma, unsigned long addr, pages_to_write_in_pmd -= batch_size; remaining_pages_total -= batch_size; } - if (remaining_pages_total) + if (remaining_pages_total) { + pte_put(mm, pmd, start); goto more; + } ret = 0; +put: + pte_put(mm, pmd, start); out: *num = remaining_pages_total; return ret; @@ -2075,6 +2089,7 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, out_unlock: pte_unmap_unlock(pte, ptl); + pte_put(mm, pte_to_pmd(pte), addr); return VM_FAULT_NOPAGE; } @@ -2275,6 +2290,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) { + unsigned long start = addr; pte_t *pte, *mapped_pte; spinlock_t *ptl; int err = 0; @@ -2294,6 +2310,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(mapped_pte, ptl); + pte_put(mm, pmd, start); return err; } @@ -2503,6 +2520,7 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, pte_fn_t fn, void *data, bool create, pgtbl_mod_mask *mask) { + unsigned long start = addr; pte_t *pte, *mapped_pte; int err = 0; spinlock_t *ptl; @@ -2536,8 +2554,11 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, arch_leave_lazy_mmu_mode(); - if (mm != &init_mm) + if (mm != &init_mm) { pte_unmap_unlock(mapped_pte, ptl); + if (create) + pte_put(mm, pmd, start); + } return err; } @@ -3761,7 +3782,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* Deliver the page fault to userland, check inside PT lock */ if (userfaultfd_missing(vma)) { pte_unmap_unlock(vmf->pte, vmf->ptl); - return handle_userfault(vmf, VM_UFFD_MISSING); + ret = handle_userfault(vmf, VM_UFFD_MISSING); + goto put; } goto setpte; } @@ -3804,7 +3826,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (userfaultfd_missing(vma)) { pte_unmap_unlock(vmf->pte, vmf->ptl); put_page(page); - return handle_userfault(vmf, VM_UFFD_MISSING); + ret = handle_userfault(vmf, VM_UFFD_MISSING); + goto put; } inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); @@ -3817,14 +3840,17 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) update_mmu_cache(vma, vmf->address, vmf->pte); unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); - return ret; + goto put; release: put_page(page); goto unlock; oom_free_page: put_page(page); oom: - return VM_FAULT_OOM; + ret = VM_FAULT_OOM; +put: + pte_put(vma->vm_mm, vmf->pmd, vmf->address); + return ret; } /* @@ -4031,7 +4057,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return ret; } - if (pmd_none(*vmf->pmd)) { +retry: + ret = pte_try_get(vmf->pmd); + if (ret == TRYGET_FAILED_NONE) { int alloc_ret; if (PageTransCompound(page)) { @@ -4047,9 +4075,11 @@ vm_fault_t finish_fault(struct vm_fault *vmf) if (unlikely(alloc_ret != INSTALLED_PTE)) return alloc_ret < 0 ? VM_FAULT_OOM : 0; - } else if (pmd_devmap_trans_unstable(vmf->pmd)) { + } else if (ret == TRYGET_FAILED_HUGE_PMD) { /* See comment in handle_pte_fault() */ return 0; + } else if (ret == TRYGET_FAILED_ZERO) { + goto retry; } vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, @@ -4063,6 +4093,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) update_mmu_tlb(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); + pte_put(vma->vm_mm, vmf->pmd, vmf->address); return ret; } diff --git a/mm/migrate.c b/mm/migrate.c index bdfdfd3b50be..26f16a4836d8 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2736,9 +2736,9 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto abort; if (unlikely(anon_vma_prepare(vma))) - goto abort; + goto put; if (mem_cgroup_charge(page_folio(page), vma->vm_mm, GFP_KERNEL)) - goto abort; + goto put; /* * The memory barrier inside __SetPageUptodate makes sure that @@ -2764,7 +2764,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, * device memory. */ pr_warn_once("Unsupported ZONE_DEVICE page type.\n"); - goto abort; + goto put; } } else { entry = mk_pte(page, vma->vm_page_prot); @@ -2811,11 +2811,14 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } pte_unmap_unlock(ptep, ptl); + pte_put(mm, pmdp, addr); *src = MIGRATE_PFN_MIGRATE; return; unlock_abort: pte_unmap_unlock(ptep, ptl); +put: + pte_put(mm, pmdp, addr); abort: *src &= ~MIGRATE_PFN_MIGRATE; } diff --git a/mm/mlock.c b/mm/mlock.c index e263d62ae2d0..a4ef20ba9627 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -398,6 +398,7 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, break; } pte_unmap_unlock(pte, ptl); + pte_put(vma->vm_mm, pte_to_pmd(pte), start); return start; } diff --git a/mm/mremap.c b/mm/mremap.c index fc5c56858883..f80c628db25d 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -555,6 +555,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, break; move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma, new_pmd, new_addr, need_rmap_locks); + pte_put(new_vma->vm_mm, new_pmd, new_addr); } mmu_notifier_invalidate_range_end(&range); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 2cea08e7f076..37df899a1b9d 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -574,6 +574,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, while (src_addr < src_start + len) { pmd_t dst_pmdval; + enum pte_tryget_type tryget_type; BUG_ON(dst_addr >= dst_start + len); @@ -583,6 +584,14 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, break; } +again: + /* + * After the management of the PTE page changes to the refcount + * mode, the PTE page may be released by another thread(rcu mode), + * so the rcu lock is held here to prevent the PTE page from + * being released. + */ + rcu_read_lock(); dst_pmdval = pmd_read_atomic(dst_pmd); /* * If the dst_pmd is mapped as THP don't @@ -593,7 +602,9 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, break; } - if (unlikely(pmd_none(dst_pmdval))) { + tryget_type = pte_try_get(&dst_pmdval); + rcu_read_unlock(); + if (unlikely(tryget_type == TRYGET_FAILED_NONE)) { int ret = __pte_alloc(dst_mm, dst_pmd); /* @@ -607,6 +618,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, err = -EFAULT; break; } + } else if (unlikely(tryget_type == TRYGET_FAILED_ZERO)) { + goto again; } BUG_ON(pmd_none(*dst_pmd)); @@ -614,6 +627,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, &page, mcopy_mode, wp_copy); + pte_put(dst_mm, dst_pmd, dst_addr); cond_resched(); if (unlikely(err == -ENOENT)) { From patchwork Wed Nov 10 08:40:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 077FAC433FE for ; Wed, 10 Nov 2021 08:42:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AF16E6134F for ; Wed, 10 Nov 2021 08:42:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AF16E6134F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 50C636B0082; Wed, 10 Nov 2021 03:42:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4BBD46B0085; Wed, 10 Nov 2021 03:42:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 384D46B0087; Wed, 10 Nov 2021 03:42:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id 28D596B0082 for ; Wed, 10 Nov 2021 03:42:17 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E000C7CB4C for ; Wed, 10 Nov 2021 08:42:16 +0000 (UTC) X-FDA: 78792378672.26.E3358BE Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf24.hostedemail.com (Postfix) with ESMTP id 8E46CB0000B2 for ; Wed, 10 Nov 2021 08:42:16 +0000 (UTC) Received: by mail-pj1-f52.google.com with SMTP id y14-20020a17090a2b4e00b001a5824f4918so1223011pjc.4 for ; Wed, 10 Nov 2021 00:42:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7ATw7hUTYZVQ0DwLa45RdRSrYNXHKuwCC3S5w4hY8zw=; b=cmfLwBU+Bm1CpzzA7YTIcjiVMZz4yqSlGwDAaFsgZGEl8L+xs0oT0q9kk5aEnq3nQw LOKt0Q4WxyBf7NuRLEdrDbMmyLLWjO5liJ8HjxmxNZzQwIs+Ij8/0ZAMiK40I2sKlFnY /5drhIPFmigd72gQT+hDVW9zUYeUpkWQU+sZ5y+K+yfDY46srQAHN3xRiPUQ+Cl6ZchI NIjc/Z5FOHtNxC1evaMqQoL8oH3GairafCktj7psqIoxN5lISzTj3TtM/YmXgSJhIlzL H4FeMwgGqf3zJJQk4i9KRerKfmx8NYf+NAzyw+kUN0eM1/op1FLq0hYk/Y9Hs2gQOYEU Lbdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7ATw7hUTYZVQ0DwLa45RdRSrYNXHKuwCC3S5w4hY8zw=; b=SMx6RZ4b3DxyqkZPWdkrljLcoIXZSXxTatIACSz6mi5uJzrEgcFArUJNchCY+VkfHn IcW9brTVR2+dtakfMOlKMb+4Y6gs6zWrlKV2Rpt1aJgvHXbGZ4AcTPPjRPD9yB32YHG6 FJKqsZE6sEbLWJb0PPVm1Ksn+d2g4XIeM10vEO5d9AFsV9wa7m8JJEEPTj9XhZu71TYQ vuv4eZQdwb3jsa2JbGi6Nbza94WQsEv9P3U5QLJvIsMYrqnwOeoEeYB47mKIps1EsMmM FCvcuhhPB3HSN/EACZvZQoRkQAFhRD+uQuqspQwWT5KPm0cD+kmoRX8AmCfFOtejiTIN Ip1Q== X-Gm-Message-State: AOAM531/vuxRX5KjDytRLw1t3n4Leq10TNPGtLMncRw1M6XDHUIOfFGE hiQcD908lwTl4CdbZdQm1giDWA== X-Google-Smtp-Source: ABdhPJwUSwvt3YOuBif1YVBS93VA3jBoJ7dOJ/5jXBZrZeGVyM6Fxna7uL3ysN+eFkr3A5oGtXJTpQ== X-Received: by 2002:a17:902:e9c6:b0:141:c588:99b2 with SMTP id 6-20020a170902e9c600b00141c58899b2mr14187337plk.63.1636533735736; Wed, 10 Nov 2021 00:42:15 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.42.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:42:15 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 08/15] mm/pte_ref: initialize the refcount of the withdrawn PTE page table page Date: Wed, 10 Nov 2021 16:40:50 +0800 Message-Id: <20211110084057.27676-9-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8E46CB0000B2 X-Stat-Signature: fd1w1bfku5x81chbzii3ddah3jfutd9k Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=cmfLwBU+; spf=pass (imf24.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1636533736-254980 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000011, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When we split the PMD-mapped THP to the PTE-mapped THP, we should initialize the refcount of the withdrawn PTE page table page to HPAGE_PMD_NR, which ensures that we can release the PTE page table page when it is free(the refcount is 0). Signed-off-by: Qi Zheng --- mm/pgtable-generic.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 4e640baf9794..523053e09dfa 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -186,6 +186,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) struct page, lru); if (pmd_huge_pte(mm, pmdp)) list_del(&pgtable->lru); + pte_ref_init(pgtable, pmdp, HPAGE_PMD_NR); return pgtable; } #endif From patchwork Wed Nov 10 08:40:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 12611607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BCACC433FE for ; Wed, 10 Nov 2021 08:42:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A54196103D for ; Wed, 10 Nov 2021 08:42:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A54196103D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4969D6B0087; Wed, 10 Nov 2021 03:42:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 447316B0088; Wed, 10 Nov 2021 03:42:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E7056B0089; Wed, 10 Nov 2021 03:42:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id 1FE9A6B0087 for ; Wed, 10 Nov 2021 03:42:25 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CF993184E9AE7 for ; Wed, 10 Nov 2021 08:42:24 +0000 (UTC) X-FDA: 78792379008.02.536AFDF Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf30.hostedemail.com (Postfix) with ESMTP id DC088E001989 for ; Wed, 10 Nov 2021 08:42:03 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id b68so2012112pfg.11 for ; Wed, 10 Nov 2021 00:42:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=rjaHrKt+DJP4IEAgkVBlHHfdXPa5cKGowXjGMRJo0SA=; b=C4UYEqJbuSDRf/HMtynmIRwNESH9O6jxXXNUQZQOsEYqic/M0wQ/xU6YvMmiqPK1rY vL/xA3bmK/+sple9TJdFkNoLuJiBZpMVcWRIfnhKuoB/91V+YZpSt1ggPZZQUNaYTsh3 QUkdJ+eHNBFjtd+oOaAmpIWXA921bTXBs12U12pHdg9KppTWdY7V6uiYx3bVreTvbMv4 DdIhFNmqy8fsyKC6fjn28QKRFGjWf0egir1Lt4+AryHP8RrZOSAngVQOuIDBlggIjdIL Kd+VD6KxzosgM1rZGuqUyHUT4c20f6wjwxci47mA8HsAP2onT9Av0gL/wXywgluc1cOO mL7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=rjaHrKt+DJP4IEAgkVBlHHfdXPa5cKGowXjGMRJo0SA=; b=C/NcazGzWDqi5HMLHM+bKGYngeZGbqCkbGzDRkld5bb4Ni82DDIBm1NxfIl0TINJ+2 Mw4H9y09935dZ74RL5IFTimovJpvbBbvsXkKbEheKq2uu1bzi/+mKSYVj2yPy9Bi7EVQ bti6araS86D9eAcRunZJyAZvV5JhqNIkk53Zo/k2AhAsbwkuHTLT4XCz8eONvFx/R2EW sBk26KRJqiF2FRQhR24qIO7o0fszUTE9vz5Cs8vhuM6PLTf224Z48nD0fepJbHa+AtiI sqBKhIw1o7qNmJXjqyZhC+SO4WwNwWg8Oq0t8ZQOd2OYQmhygvKI5kkHL9KDwF7RFcN+ XJ7g== X-Gm-Message-State: AOAM531qN3tl195yTOAgyhDnhtn+WrAChTb++kyNwvEm/PcAjzRT1aQh AVhIdcJz1rY3kF0oBwmPc656lQ== X-Google-Smtp-Source: ABdhPJzsmr+/bJqQnaZuF7I8RXRlcGDbF/wlFEYwQbI5RWmYoIdcsvMqexe1DizmGtZg638OSigtQA== X-Received: by 2002:a05:6a00:238d:b0:47c:2232:80d8 with SMTP id f13-20020a056a00238d00b0047c223280d8mr14913499pfc.12.1636533743444; Wed, 10 Nov 2021 00:42:23 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id v38sm5485368pgl.38.2021.11.10.00.42.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Nov 2021 00:42:23 -0800 (PST) From: Qi Zheng To: akpm@linux-foundation.org, tglx@linutronix.de, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, david@redhat.com, jgg@nvidia.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com, zhouchengming@bytedance.com, Qi Zheng Subject: [PATCH v3 09/15] mm/pte_ref: add support for the map/unmap of user PTE page table page Date: Wed, 10 Nov 2021 16:40:51 +0800 Message-Id: <20211110084057.27676-10-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20211110084057.27676-1-zhengqi.arch@bytedance.com> References: <20211110084057.27676-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DC088E001989 X-Stat-Signature: 3z1sxcmf1j6g4chja41mhamktg5qzro1 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=C4UYEqJb; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf30.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com X-HE-Tag: 1636533723-73988 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The !pte_none() entry will take a reference on the user PTE page table page, such as regular page table entry that map physical pages, or swap entry, or migrate entry, etc. So a pte_none() entry is mapped, it needs to increase the refcount of the PTE page table page. When a !pte_none() entry becomes none, the refcount of the PTE page table page needs to be decreased. For swap or migrate cases, which only change the content of the PTE entry, we keep the refcount unchanged. Signed-off-by: Qi Zheng --- kernel/events/uprobes.c | 2 ++ mm/filemap.c | 3 +++ mm/madvise.c | 5 +++++ mm/memory.c | 42 +++++++++++++++++++++++++++++++++++------- mm/migrate.c | 1 + mm/mremap.c | 7 +++++++ mm/rmap.c | 10 ++++++++++ mm/userfaultfd.c | 2 ++ 8 files changed, 65 insertions(+), 7 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 6357c3580d07..96dd2959e1ac 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -200,6 +200,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, if (new_page) set_pte_at_notify(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot)); + else + pte_put(mm, pte_to_pmd(pvmw.pte), addr); page_remove_rmap(old_page, false); if (!page_mapped(old_page)) diff --git a/mm/filemap.c b/mm/filemap.c index 1e7e9e4fd759..aa47ee11a3d8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3309,6 +3309,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, struct page *head, *page; unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); vm_fault_t ret = 0; + unsigned int nr_get = 0; rcu_read_lock(); head = first_map_page(mapping, &xas, end_pgoff); @@ -3342,6 +3343,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, ret = VM_FAULT_NOPAGE; do_set_pte(vmf, page, addr); + nr_get++; /* no need to invalidate: a not-present page won't be cached */ update_mmu_cache(vma, addr, vmf->pte); unlock_page(head); @@ -3351,6 +3353,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, put_page(head); } while ((head = next_map_page(mapping, &xas, end_pgoff)) != NULL); pte_unmap_unlock(vmf->pte, vmf->ptl); + pte_get_many(vmf->pmd, nr_get); pte_put(vma->vm_mm, vmf->pmd, start); out: rcu_read_unlock(); diff --git a/mm/madvise.c b/mm/madvise.c index 0734db8d53a7..82fc40b6dcbf 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -580,6 +580,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, struct page *page; int nr_swap = 0; unsigned long next; + unsigned int nr_put = 0; + unsigned long start = addr; next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) @@ -612,6 +614,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, nr_swap--; free_swap_and_cache(entry); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + nr_put++; continue; } @@ -696,6 +699,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); + if (nr_put) + pte_put_many(mm, pmd, start, nr_put); cond_resched(); next: return 0; diff --git a/mm/memory.c b/mm/memory.c index 0b9af38cfa11..ea4d651ac8c7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -878,6 +878,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (!userfaultfd_wp(dst_vma)) pte = pte_swp_clear_uffd_wp(pte); set_pte_at(dst_mm, addr, dst_pte, pte); + pte_get(pte_to_pmd(dst_pte)); return 0; } @@ -946,6 +947,7 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma /* Uffd-wp needs to be delivered to dest pte as well */ pte = pte_wrprotect(pte_mkuffd_wp(pte)); set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); + pte_get(pte_to_pmd(dst_pte)); return 0; } @@ -998,6 +1000,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, pte = pte_clear_uffd_wp(pte); set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); + pte_get(pte_to_pmd(dst_pte)); return 0; } @@ -1335,6 +1338,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, pte_t *start_pte; pte_t *pte; swp_entry_t entry; + unsigned int nr_put = 0; + unsigned long start = addr; tlb_change_page_size(tlb, PAGE_SIZE); again: @@ -1359,6 +1364,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + nr_put++; tlb_remove_tlb_entry(tlb, pte, addr); if (unlikely(!page)) continue; @@ -1392,6 +1398,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (unlikely(zap_skip_check_mapping(details, page))) continue; pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + nr_put++; rss[mm_counter(page)]--; if (is_device_private_entry(entry)) @@ -1416,6 +1423,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + nr_put++; } while (pte++, addr += PAGE_SIZE, addr != end); add_mm_rss_vec(mm, rss); @@ -1442,6 +1450,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, goto again; } + if (nr_put) + pte_put_many(mm, pmd, start, nr_put); + return addr; } @@ -1759,6 +1770,7 @@ static int insert_page_into_pte_locked(struct mm_struct *mm, pte_t *pte, inc_mm_counter_fast(mm, mm_counter_file(page)); page_add_file_rmap(page, false); set_pte_at(mm, addr, pte, mk_pte(page, prot)); + pte_get(pte_to_pmd(pte)); return 0; } @@ -2085,6 +2097,7 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, } set_pte_at(mm, addr, pte, entry); + pte_get(pte_to_pmd(pte)); update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */ out_unlock: @@ -2291,6 +2304,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, unsigned long pfn, pgprot_t prot) { unsigned long start = addr; + unsigned int nr_get = 0; pte_t *pte, *mapped_pte; spinlock_t *ptl; int err = 0; @@ -2306,10 +2320,12 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, break; } set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); + nr_get++; pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(mapped_pte, ptl); + pte_get_many(pmd, nr_get); pte_put(mm, pmd, start); return err; } @@ -2524,6 +2540,7 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, pte_t *pte, *mapped_pte; int err = 0; spinlock_t *ptl; + unsigned int nr_put = 0, nr_get = 0; if (create) { mapped_pte = pte = (mm == &init_mm) ? @@ -2531,6 +2548,7 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; + nr_put++; } else { mapped_pte = pte = (mm == &init_mm) ? pte_offset_kernel(pmd, addr) : @@ -2543,11 +2561,17 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, if (fn) { do { - if (create || !pte_none(*pte)) { + if (create) { err = fn(pte++, addr, data); - if (err) - break; + if (mm != &init_mm && !pte_none(*(pte-1))) + nr_get++; + } else if (!pte_none(*pte)) { + err = fn(pte++, addr, data); + if (mm != &init_mm && pte_none(*(pte-1))) + nr_put++; } + if (err) + break; } while (addr += PAGE_SIZE, addr != end); } *mask |= PGTBL_PTE_MODIFIED; @@ -2556,8 +2580,9 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, if (mm != &init_mm) { pte_unmap_unlock(mapped_pte, ptl); - if (create) - pte_put(mm, pmd, start); + pte_get_many(pmd, nr_get); + if (nr_put) + pte_put_many(mm, pmd, start, nr_put); } return err; } @@ -3835,6 +3860,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) lru_cache_add_inactive_or_unevictable(page, vma); setpte: set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + pte_get(vmf->pmd); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); @@ -4086,10 +4112,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) vmf->address, &vmf->ptl); ret = 0; /* Re-check under ptl */ - if (likely(pte_none(*vmf->pte))) + if (likely(pte_none(*vmf->pte))) { do_set_pte(vmf, page, vmf->address); - else + pte_get(vmf->pmd); + } else { ret = VM_FAULT_NOPAGE; + } update_mmu_tlb(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); diff --git a/mm/migrate.c b/mm/migrate.c index 26f16a4836d8..c03ac25f42a9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2807,6 +2807,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } else { /* No need to invalidate - it was non-present before */ set_pte_at(mm, addr, ptep, entry); + pte_get(pmdp); update_mmu_cache(vma, addr, ptep); } diff --git a/mm/mremap.c b/mm/mremap.c index f80c628db25d..088a7a75cb4b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -141,6 +141,8 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, spinlock_t *old_ptl, *new_ptl; bool force_flush = false; unsigned long len = old_end - old_addr; + unsigned int nr_put = 0, nr_get = 0; + unsigned long old_start = old_addr; /* * When need_rmap_locks is true, we take the i_mmap_rwsem and anon_vma @@ -181,6 +183,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, continue; pte = ptep_get_and_clear(mm, old_addr, old_pte); + nr_put++; /* * If we are remapping a valid PTE, make sure * to flush TLB before we drop the PTL for the @@ -197,6 +200,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); pte = move_soft_dirty_pte(pte); set_pte_at(mm, new_addr, new_pte, pte); + nr_get++; } arch_leave_lazy_mmu_mode(); @@ -206,6 +210,9 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, spin_unlock(new_ptl); pte_unmap(new_pte - 1); pte_unmap_unlock(old_pte - 1, old_ptl); + pte_get_many(new_pmd, nr_get); + if (nr_put) + pte_put_many(mm, old_pmd, old_start, nr_put); if (need_rmap_locks) drop_rmap_locks(vma); } diff --git a/mm/rmap.c b/mm/rmap.c index 2908d637bcad..630ce8a036b5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1404,6 +1404,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, bool ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + unsigned int nr_put = 0; /* * When racing against e.g. zap_pte_range() on another cpu, @@ -1551,6 +1552,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); + nr_put++; } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(subpage) }; pte_t swp_pte; @@ -1564,6 +1566,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); + pte_put(mm, pvmw.pmd, address); page_vma_mapped_walk_done(&pvmw); break; } @@ -1575,6 +1578,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); dec_mm_counter(mm, MM_ANONPAGES); + nr_put++; goto discard; } @@ -1630,6 +1634,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * See Documentation/vm/mmu_notifier.rst */ dec_mm_counter(mm, mm_counter_file(page)); + nr_put++; } discard: /* @@ -1641,6 +1646,10 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, */ page_remove_rmap(subpage, PageHuge(page)); put_page(page); + if (nr_put) { + pte_put_many(mm, pvmw.pmd, address, nr_put); + nr_put = 0; + } } mmu_notifier_invalidate_range_end(&range); @@ -1871,6 +1880,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); + pte_put(mm, pvmw.pmd, address); } else { swp_entry_t entry; pte_t swp_pte; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 37df899a1b9d..b87c61b94065 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -110,6 +110,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, lru_cache_add_inactive_or_unevictable(page, dst_vma); set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); + pte_get(dst_pmd); /* No need to invalidate - it was non-present before */ update_mmu_cache(dst_vma, dst_addr, dst_pte); @@ -204,6 +205,7 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, if (!pte_none(*dst_pte)) goto out_unlock; set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); + pte_get(dst_pmd); /* No need to invalidate - it was non-present before */ update_mmu_cache(dst_vma, dst_addr, dst_pte); ret = 0;