From patchwork Wed Apr 7 01:44:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Lespinasse X-Patchwork-Id: 12186467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3651C43461 for ; Wed, 7 Apr 2021 01:45:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8FB5D61165 for ; Wed, 7 Apr 2021 01:45:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FB5D61165 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=lespinasse.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1CB3F8E000E; Tue, 6 Apr 2021 21:45:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B5F28E0017; Tue, 6 Apr 2021 21:45:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1BA86B0096; Tue, 6 Apr 2021 21:45:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0090.hostedemail.com [216.40.44.90]) by kanga.kvack.org (Postfix) with ESMTP id 8CA368E0002 for ; Tue, 6 Apr 2021 21:45:08 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4C4C58249980 for ; Wed, 7 Apr 2021 01:45:08 +0000 (UTC) X-FDA: 78003877896.17.D9A78C6 Received: from server.lespinasse.org (server.lespinasse.org [63.205.204.226]) by imf07.hostedemail.com (Postfix) with ESMTP id 9C974A000396 for ; Wed, 7 Apr 2021 01:45:07 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-11-ed; t=1617759903; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=CjG9cXBfCGz37cJgHr3H1dKyoE0oEoz4J1dkmh5NNCg=; b=yM6xNR3TMq76ZYckpLMGjkzx0S/w7f/gz1/7CSWCS9RX/0FOrYG+SfxcWVhU1zxZJBmOD /HlPaPF2xGrhS5hBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-11-rsa; t=1617759903; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=CjG9cXBfCGz37cJgHr3H1dKyoE0oEoz4J1dkmh5NNCg=; b=q0/QAhX2NFAO+hY4Y//y0LFODZO05QGiHTCbTcypAuapYNan4SDX7zfsyzfEt1XdUkjr7 2KVgBZm7Sxj37H/8sMXwgaOt4pNU6ISdqlxBHJ0oACTMN+bFbaj1KbpnlXvuk+HEMHnlV9k BM80kjgIYIBVMbDF14NTSi51Vnfz3GkYiggzBWUTur9OWzbefrZ6DUXChgzsLZxoBGGrSh0 lpCPsKML05GVhgOWfIPLGOOOQ1jTOSeYb8oWE8z55oIj99zsZ9ILV2I34xYo68E7Nw4uv3p DmODchp4sYxTrYsb8lFUKDXU2IgXjF4tIyCCmzCzh47MYSEEJ0e6/veFr7VA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 023FA160350; Tue, 6 Apr 2021 18:45:03 -0700 (PDT) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id E81F619F31F; Tue, 6 Apr 2021 18:45:02 -0700 (PDT) From: Michel Lespinasse To: Linux-MM Cc: Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Paul McKenney , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Rom Lemarchand , Linux-Kernel , Michel Lespinasse Subject: [RFC PATCH 21/37] mm: implement speculative handling in do_swap_page() Date: Tue, 6 Apr 2021 18:44:46 -0700 Message-Id: <20210407014502.24091-22-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210407014502.24091-1-michel@lespinasse.org> References: <20210407014502.24091-1-michel@lespinasse.org> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9C974A000396 X-Stat-Signature: 8zihohywctqyir7arayjgzfhistmq8wx Received-SPF: none (lespinasse.org>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=server.lespinasse.org; client-ip=63.205.204.226 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617759907-936046 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If the pte is larger than long, use pte_spinlock() to lock the page table when verifying the pte - pte_spinlock() is necessary to ensure the page table is still valid when we are locking it. Abort speculative faults if the pte is not a swap entry, or if the desired page is not found in swap cache, to keep things as simple as possible. Only use trylock when locking the swapped page - again to keep things simple, and also the usual lock_page_or_retry would otherwise try to release the mmap lock which is not held in the speculative case. Use pte_map_lock() to ensure proper synchronization when finally committing the faulted page to the mm address space. Signed-off-by: Michel Lespinasse --- mm/memory.c | 74 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 32 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index fc555fae0844..ab3160719bf3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2637,30 +2637,6 @@ bool __pte_map_lock(struct vm_fault *vmf) #endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ -/* - * handle_pte_fault chooses page fault handler according to an entry which was - * read non-atomically. Before making any commitment, on those architectures - * or configurations (e.g. i386 with PAE) which might give a mix of unmatched - * parts, do_swap_page must check under lock before unmapping the pte and - * proceeding (but do_wp_page is only called after already making such a check; - * and do_anonymous_page can safely check later on). - */ -static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, - pte_t *page_table, pte_t orig_pte) -{ - int same = 1; -#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) - if (sizeof(pte_t) > sizeof(unsigned long)) { - spinlock_t *ptl = pte_lockptr(mm, pmd); - spin_lock(ptl); - same = pte_same(*page_table, orig_pte); - spin_unlock(ptl); - } -#endif - pte_unmap(page_table); - return same; -} - static inline bool cow_user_page(struct page *dst, struct page *src, struct vm_fault *vmf) { @@ -3369,12 +3345,34 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) return VM_FAULT_RETRY; } - if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) - goto out; +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) + if (sizeof(pte_t) > sizeof(unsigned long)) { + /* + * vmf->orig_pte was read non-atomically. Before making + * any commitment, on those architectures or configurations + * (e.g. i386 with PAE) which might give a mix of + * unmatched parts, we must check under lock before + * unmapping the pte and proceeding. + * + * (but do_wp_page is only called after already making + * such a check; and do_anonymous_page can safely + * check later on). + */ + if (!pte_spinlock(vmf)) + return VM_FAULT_RETRY; + if (!pte_same(*vmf->pte, vmf->orig_pte)) + goto unlock; + spin_unlock(vmf->ptl); + } +#endif + pte_unmap(vmf->pte); + vmf->pte = NULL; entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { - if (is_migration_entry(entry)) { + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + ret = VM_FAULT_RETRY; + } else if (is_migration_entry(entry)) { migration_entry_wait(vma->vm_mm, vmf->pmd, vmf->address); } else if (is_device_private_entry(entry)) { @@ -3395,8 +3393,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swapcache = page; if (!page) { - struct swap_info_struct *si = swp_swap_info(entry); + struct swap_info_struct *si; + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return VM_FAULT_RETRY; + } + + si = swp_swap_info(entry); if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { /* skip swapcache */ @@ -3459,7 +3463,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_release; } - locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + locked = trylock_page(page); + else + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); delayacct_clear_flag(DELAYACCT_PF_SWAPIN); if (!locked) { @@ -3487,10 +3494,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) cgroup_throttle_swaprate(page, GFP_KERNEL); /* - * Back out if somebody else already faulted in this pte. + * Back out if the VMA has changed in our back during a speculative + * page fault or if somebody else already faulted in this pte. */ - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); + if (!pte_map_lock(vmf)) { + ret = VM_FAULT_RETRY; + goto out_page; + } if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) goto out_nomap;