From patchwork Fri Apr 30 19:52:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Lespinasse X-Patchwork-Id: 12234197 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4673FC433B4 for ; Fri, 30 Apr 2021 19:53:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E14816145A for ; Fri, 30 Apr 2021 19:53:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E14816145A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=lespinasse.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0CD4A6B0098; Fri, 30 Apr 2021 15:52:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F9D96B0083; Fri, 30 Apr 2021 15:52:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81BF36B009E; Fri, 30 Apr 2021 15:52:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0252.hostedemail.com [216.40.44.252]) by kanga.kvack.org (Postfix) with ESMTP id 335E06B0095 for ; Fri, 30 Apr 2021 15:52:38 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E2097181AF5F4 for ; Fri, 30 Apr 2021 19:52:37 +0000 (UTC) X-FDA: 78090080754.28.2C953A1 Received: from server.lespinasse.org (server.lespinasse.org [63.205.204.226]) by imf29.hostedemail.com (Postfix) with ESMTP id 06510F6 for ; Fri, 30 Apr 2021 19:52:32 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-ed; t=1619812353; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=rYXn/93pk6t1tplCIUNdeG9K949Nx8Nhz+Jc2wP0R1Q=; b=newF3cfvGTS9NZHu8Z9WaTA6N891W7+LcXoBNpRst00ZibYoGbtRvwmRC4rI9PeS1dVL4 ywG6C8+DYN7OLn5Ag== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-rsa; t=1619812353; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=rYXn/93pk6t1tplCIUNdeG9K949Nx8Nhz+Jc2wP0R1Q=; b=lq1GjaSqce8boFrQcjPEUKQ/j7fShwyrCrWm0+mf8OdhWocXr/ZEjuG1q+Eb/JSPdxf0v MF4X4ZL/cUAg/T8w08i5+YfyObI/RWtNRFEj6Hoj1RofQGK6HXxcyyBVGnkZB73xigLFRg1 /nt84j2+hYZpwF5R1SDZ0ddaIBMppSckoH+lpr18YgkWxSxK4bf1XpcRgR0/BBVWGlyytcT YdZLpB5op6orZRAsS9mb+EZFHVjyicfFkAcIKsWPbiEVzx63ALx8bNrVSu5Zp+VhIE/spFt OAIq04qEeqX2U1tCOto3PsN2VdkhoMmXwhclJ77F1/sDDkeYRTf+DCc/5etQ== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 70663160327; Fri, 30 Apr 2021 12:52:33 -0700 (PDT) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 60D9C19F521; Fri, 30 Apr 2021 12:52:33 -0700 (PDT) From: Michel Lespinasse To: Linux-MM , Linux-Kernel Cc: Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Paul McKenney , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Andy Lutomirski , Michel Lespinasse Subject: [PATCH 15/29] mm: implement speculative handling in __handle_mm_fault(). Date: Fri, 30 Apr 2021 12:52:16 -0700 Message-Id: <20210430195232.30491-16-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210430195232.30491-1-michel@lespinasse.org> References: <20210430195232.30491-1-michel@lespinasse.org> MIME-Version: 1.0 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=lespinasse.org header.s=srv-14-ed header.b=newF3cfv; dkim=pass header.d=lespinasse.org header.s=srv-14-rsa header.b=lq1GjaSq; dmarc=pass (policy=none) header.from=lespinasse.org; spf=pass (imf29.hostedemail.com: domain of walken@lespinasse.org designates 63.205.204.226 as permitted sender) smtp.mailfrom=walken@lespinasse.org X-Rspamd-Server: rspam03 X-Stat-Signature: zp9hngybrowhnj4zu6urrr9kig4piki6 X-Rspamd-Queue-Id: 06510F6 Received-SPF: none (lespinasse.org>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=server.lespinasse.org; client-ip=63.205.204.226 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619812352-988375 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The speculative path calls speculative_page_walk_begin() before walking the page table tree to prevent page table reclamation. The logic is otherwise similar to the non-speculative path, but with additional restrictions: in the speculative path, we do not handle huge pages or wiring new pages tables. Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 4 +++ mm/memory.c | 77 ++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 79 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d5988e78e6ab..dee8a4833779 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -525,6 +525,10 @@ struct vm_fault { }; unsigned int flags; /* FAULT_FLAG_xxx flags * XXX: should really be 'const' */ +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + unsigned long seq; + pmd_t orig_pmd; +#endif pmd_t *pmd; /* Pointer to pmd entry matching * the 'address' */ pud_t *pud; /* Pointer to pud entry matching diff --git a/mm/memory.c b/mm/memory.c index 45696166b10f..3f5c3d6c0197 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4329,7 +4329,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) * return value. See filemap_fault() and __lock_page_or_retry(). */ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, - unsigned long address, unsigned int flags) + unsigned long address, unsigned int flags, unsigned long seq) { struct vm_fault vmf = { .vma = vma, @@ -4344,6 +4344,79 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, p4d_t *p4d; vm_fault_t ret; +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + if (flags & FAULT_FLAG_SPECULATIVE) { + pgd_t pgdval; + p4d_t p4dval; + pud_t pudval; + + vmf.seq = seq; + + speculative_page_walk_begin(); + pgd = pgd_offset(mm, address); + pgdval = READ_ONCE(*pgd); + if (pgd_none(pgdval) || unlikely(pgd_bad(pgdval))) + goto spf_fail; + + p4d = p4d_offset(pgd, address); + p4dval = READ_ONCE(*p4d); + if (p4d_none(p4dval) || unlikely(p4d_bad(p4dval))) + goto spf_fail; + + vmf.pud = pud_offset(p4d, address); + pudval = READ_ONCE(*vmf.pud); + if (pud_none(pudval) || unlikely(pud_bad(pudval)) || + unlikely(pud_trans_huge(pudval)) || + unlikely(pud_devmap(pudval))) + goto spf_fail; + + vmf.pmd = pmd_offset(vmf.pud, address); + vmf.orig_pmd = READ_ONCE(*vmf.pmd); + + /* + * pmd_none could mean that a hugepage collapse is in + * progress in our back as collapse_huge_page() mark + * it before invalidating the pte (which is done once + * the IPI is catched by all CPU and we have interrupt + * disabled). For this reason we cannot handle THP in + * a speculative way since we can't safely identify an + * in progress collapse operation done in our back on + * that PMD. + */ + if (unlikely(pmd_none(vmf.orig_pmd) || + is_swap_pmd(vmf.orig_pmd) || + pmd_trans_huge(vmf.orig_pmd) || + pmd_devmap(vmf.orig_pmd))) + goto spf_fail; + + /* + * The above does not allocate/instantiate page-tables because + * doing so would lead to the possibility of instantiating + * page-tables after free_pgtables() -- and consequently + * leaking them. + * + * The result is that we take at least one non-speculative + * fault per PMD in order to instantiate it. + */ + + vmf.pte = pte_offset_map(vmf.pmd, address); + vmf.orig_pte = READ_ONCE(*vmf.pte); + barrier(); + if (pte_none(vmf.orig_pte)) { + pte_unmap(vmf.pte); + vmf.pte = NULL; + } + + speculative_page_walk_end(); + + return handle_pte_fault(&vmf); + + spf_fail: + speculative_page_walk_end(); + return VM_FAULT_RETRY; + } +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ + pgd = pgd_offset(mm, address); p4d = p4d_alloc(mm, pgd, address); if (!p4d) @@ -4563,7 +4636,7 @@ vm_fault_t do_handle_mm_fault(struct vm_area_struct *vma, if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else - ret = __handle_mm_fault(vma, address, flags); + ret = __handle_mm_fault(vma, address, flags, seq); if (flags & FAULT_FLAG_USER) { mem_cgroup_exit_user_fault();