From patchwork Thu Apr 1 18:21:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 12179161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3385C43461 for ; Thu, 1 Apr 2021 18:21:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 722DD60240 for ; Thu, 1 Apr 2021 18:21:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 722DD60240 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 103886B0075; Thu, 1 Apr 2021 14:21:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DB606B0078; Thu, 1 Apr 2021 14:21:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE4506B007D; Thu, 1 Apr 2021 14:21:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id D3B5F6B0075 for ; Thu, 1 Apr 2021 14:21:31 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8C927BF10 for ; Thu, 1 Apr 2021 18:21:31 +0000 (UTC) X-FDA: 77984615982.11.9CA1840 Received: from mail-qt1-f201.google.com (mail-qt1-f201.google.com [209.85.160.201]) by imf11.hostedemail.com (Postfix) with ESMTP id 972BB200025C for ; Thu, 1 Apr 2021 18:21:29 +0000 (UTC) Received: by mail-qt1-f201.google.com with SMTP id b18so3588315qte.21 for ; Thu, 01 Apr 2021 11:21:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=5HT0qAWVFEwmOUtF6La54IUlPIkCZq6cCaZb+QDq3o0=; b=FCS4HYb6J3nfpKWojigWdeLm322x4N3yal1ui3yFWmzroHvdmhbKuSF3DpbqsIBKqU 3C1CdQ+JY3kcg+YFAsmQ5yLfUbgENDF0IPDrSY+6PjWU94p8tWRFMKPwcAiU0sniPxDq 1TF2+8+Ip8kSSHIhLIvSRZcY7gCVna0ZgXGr5eYamiInLGVJQh+k680kXCav7A/idNVP 8SfPeqsoxzgrtq++O0zdmEqmBXYA68o6+WEbD7dN4ckj4+wBapc6cclze2NFWqKPe5am FpGwFjYcZzq2q5aJNUuKnYZmVnteJrzOrphPlZE2ju6i7MwUENmkr/13kR4/M7hOfJdu j0WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=5HT0qAWVFEwmOUtF6La54IUlPIkCZq6cCaZb+QDq3o0=; b=a2jQu+Od4WVc9LOu1PsjTvT7XpzpwEEu4dVtZhSEbIXIVYLQ946QaNOlqEOQxTZAxl mIEBFFKRyO0l6cC/ssBJregZhyJ9hcykFC4s5Ehv2d4z7enO6fHzjqH9engkrA+F+ciP oB9cSK1Ykn/oo4GSPSM5DpHLKzjPtVeH4pt5z9ekZL2iv/k5Q3gfinmqfnY0Thou9xZp evCHG1Mp+bD4kGlmkqB5qM7PQeOEQMSYrTcS2S9y+bwan35wIhMjOV5epjjjXE68FyXR 3JyZGLI340t8jNH0Fx5WBvD6nMr0VvSOY9nuGsTFmL+HEAidQTMTQvbhd8ugQH6lFeFE bFtQ== X-Gm-Message-State: AOAM5308Y7m5kW1tY4rkgsiX92/4bPlfElc3kJUl3QRBaeDhaEi1MXeF LGxM/PGTnrwXkTbpeUmOk35yy64IjV4= X-Google-Smtp-Source: ABdhPJx3CSbez+mHmIMP16uoTLOqGzUQCpgEy4t7meclV6GV32vzVuEQJjrjKkgnWp0r1JVEINQ+id1CxYw= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:ad4:58cf:: with SMTP id dh15mr9296862qvb.26.1617301290422; Thu, 01 Apr 2021 11:21:30 -0700 (PDT) Date: Thu, 1 Apr 2021 11:21:21 -0700 In-Reply-To: <20210401182125.171484-1-surenb@google.com> Message-Id: <20210401182125.171484-2-surenb@google.com> Mime-Version: 1.0 References: <20210401182125.171484-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 1/5] mm: reuse only-pte-mapped KSM page in do_wp_page() From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, Yang Shi , "Kirill A. Shutemov" , Hugh Dickins , Andrea Arcangeli , Christian Koenig , Claudio Imbrenda , Rik van Riel , Huang Ying , Minchan Kim , Andrew Morton X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 972BB200025C X-Stat-Signature: mek3baygajxo798pox13z66era5pesig Received-SPF: none (flex--surenb.bounces.google.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from="<3Kg9mYAYKCOkdfcPYMRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--surenb.bounces.google.com>"; helo=mail-qt1-f201.google.com; client-ip=209.85.160.201 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617301289-949295 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kirill Tkhai Add an optimization for KSM pages almost in the same way that we have for ordinary anonymous pages. If there is a write fault in a page, which is mapped to an only pte, and it is not related to swap cache; the page may be reused without copying its content. [ Note that we do not consider PageSwapCache() pages at least for now, since we don't want to complicate __get_ksm_page(), which has nice optimization based on this (for the migration case). Currenly it is spinning on PageSwapCache() pages, waiting for when they have unfreezed counters (i.e., for the migration finish). But we don't want to make it also spinning on swap cache pages, which we try to reuse, since there is not a very high probability to reuse them. So, for now we do not consider PageSwapCache() pages at all. ] So in reuse_ksm_page() we check for 1) PageSwapCache() and 2) page_stable_node(), to skip a page, which KSM is currently trying to link to stable tree. Then we do page_ref_freeze() to prohibit KSM to merge one more page into the page, we are reusing. After that, nobody can refer to the reusing page: KSM skips !PageSwapCache() pages with zero refcount; and the protection against of all other participants is the same as for reused ordinary anon pages pte lock, page lock and mmap_sem. [akpm@linux-foundation.org: replace BUG_ON()s with WARN_ON()s] Link: http://lkml.kernel.org/r/154471491016.31352.1168978849911555609.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai Reviewed-by: Yang Shi Cc: "Kirill A. Shutemov" Cc: Hugh Dickins Cc: Andrea Arcangeli Cc: Christian Koenig Cc: Claudio Imbrenda Cc: Rik van Riel Cc: Huang Ying Cc: Minchan Kim Cc: Kirill Tkhai Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/ksm.h | 7 +++++++ mm/ksm.c | 30 ++++++++++++++++++++++++++++-- mm/memory.c | 16 ++++++++++++++-- 3 files changed, 49 insertions(+), 4 deletions(-) diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 161e8164abcf..e48b1e453ff5 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -53,6 +53,8 @@ struct page *ksm_might_need_to_copy(struct page *page, void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc); void ksm_migrate_page(struct page *newpage, struct page *oldpage); +bool reuse_ksm_page(struct page *page, + struct vm_area_struct *vma, unsigned long address); #else /* !CONFIG_KSM */ @@ -86,6 +88,11 @@ static inline void rmap_walk_ksm(struct page *page, static inline void ksm_migrate_page(struct page *newpage, struct page *oldpage) { } +static inline bool reuse_ksm_page(struct page *page, + struct vm_area_struct *vma, unsigned long address) +{ + return false; +} #endif /* CONFIG_MMU */ #endif /* !CONFIG_KSM */ diff --git a/mm/ksm.c b/mm/ksm.c index d021bcf94c41..c4e95ca65d62 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -705,8 +705,9 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it) * case this node is no longer referenced, and should be freed; * however, it might mean that the page is under page_ref_freeze(). * The __remove_mapping() case is easy, again the node is now stale; - * but if page is swapcache in migrate_page_move_mapping(), it might - * still be our page, in which case it's essential to keep the node. + * the same is in reuse_ksm_page() case; but if page is swapcache + * in migrate_page_move_mapping(), it might still be our page, + * in which case it's essential to keep the node. */ while (!get_page_unless_zero(page)) { /* @@ -2648,6 +2649,31 @@ void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc) goto again; } +bool reuse_ksm_page(struct page *page, + struct vm_area_struct *vma, + unsigned long address) +{ +#ifdef CONFIG_DEBUG_VM + if (WARN_ON(is_zero_pfn(page_to_pfn(page))) || + WARN_ON(!page_mapped(page)) || + WARN_ON(!PageLocked(page))) { + dump_page(page, "reuse_ksm_page"); + return false; + } +#endif + + if (PageSwapCache(page) || !page_stable_node(page)) + return false; + /* Prohibit parallel get_ksm_page() */ + if (!page_ref_freeze(page, 1)) + return false; + + page_move_anon_rmap(page, vma); + page->index = linear_page_index(vma, address); + page_ref_unfreeze(page, 1); + + return true; +} #ifdef CONFIG_MIGRATION void ksm_migrate_page(struct page *newpage, struct page *oldpage) { diff --git a/mm/memory.c b/mm/memory.c index c1a05c2484b0..3874acce1472 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2846,8 +2846,11 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * Take out anonymous pages first, anonymous shared vmas are * not dirty accountable. */ - if (PageAnon(vmf->page) && !PageKsm(vmf->page)) { + if (PageAnon(vmf->page)) { int total_map_swapcount; + if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) || + page_count(vmf->page) != 1)) + goto copy; if (!trylock_page(vmf->page)) { get_page(vmf->page); pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -2862,6 +2865,15 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) } put_page(vmf->page); } + if (PageKsm(vmf->page)) { + bool reused = reuse_ksm_page(vmf->page, vmf->vma, + vmf->address); + unlock_page(vmf->page); + if (!reused) + goto copy; + wp_page_reuse(vmf); + return VM_FAULT_WRITE; + } if (reuse_swap_page(vmf->page, &total_map_swapcount)) { if (total_map_swapcount == 1) { /* @@ -2882,7 +2894,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) (VM_WRITE|VM_SHARED))) { return wp_page_shared(vmf); } - +copy: /* * Ok, we need to copy. Oh, well.. */ From patchwork Thu Apr 1 18:21:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 12179163 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1563C433B4 for ; Thu, 1 Apr 2021 18:21:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63BB260240 for ; Thu, 1 Apr 2021 18:21:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63BB260240 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 08E716B0078; Thu, 1 Apr 2021 14:21:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0659F6B007E; Thu, 1 Apr 2021 14:21:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFA6D6B0080; Thu, 1 Apr 2021 14:21:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id BDB4E6B0078 for ; Thu, 1 Apr 2021 14:21:33 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 80F7EC5AC for ; Thu, 1 Apr 2021 18:21:33 +0000 (UTC) X-FDA: 77984616066.08.2884431 Received: from mail-qt1-f201.google.com (mail-qt1-f201.google.com [209.85.160.201]) by imf06.hostedemail.com (Postfix) with ESMTP id 6E9AAC0007C0 for ; Thu, 1 Apr 2021 18:21:33 +0000 (UTC) Received: by mail-qt1-f201.google.com with SMTP id w2so3606393qts.18 for ; Thu, 01 Apr 2021 11:21:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=7FFTxJfhBub0ermj70tsntXQaF5NxHdv1LLLlAgh3Tg=; b=IvjDKg2QrBzk4nuwOJe7alSzTfjXQP4b7KooEke+eiaaPZYUwJz1yuqQH5tTDFEK98 7Lk7/uotQuA8WyQRHVZilIOzw6ATk5kIoJ1FMiDxVUWxXOo3lA5bkZ4nsXxDQzVFsb6I wQhRBt2hS9Y68Z/VRvYd4JI0ecn1ecAgTx2aT51YYHfrUeGCT4o3Vdi3jgTB0NTGksYH Ag5biBiLOCYWQEf42Fjbyn8pmn5S1XUm0OEFDO56SyluU3/SAdNQKqcz6FZKg12/s5jO hzYsgH8HQUmWqtuOIg96pFQdNKziF+3zTtd959cnOlMDo9R7W+V7/eGk9S+ly3+hVUP8 m61A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=7FFTxJfhBub0ermj70tsntXQaF5NxHdv1LLLlAgh3Tg=; b=L9LjeOUHmDwVSadk1X+SblhUkm2mfEWZI3uYKT7o3GCuVXOMIjT6ELIEDOv7Y/MrRp XDFwEmfupAtb3RsqM93ArpQoIxYRQnIWGFckThREwDqDX1CvedGFz5dC1i3yQM+0ov0C o5q7d4K+CcdI1bj/FGYri+JsD9RYQT2ffbGEwl3Qdc4NM5QJrKBDdfSwZagnqRcljeSN wQPIMdEAe0NCeTJhJr+Nf4yMlQUivfVCpPNJeHb3YsiZEETbJBQyCURVKAW9oCwb20Ha ReCTF8d5MlmwC/YLUXT7YsFH2Temsv/2iuuq6XSd7DnSiDWdu/T5FfjZKC3yL7grtqgU Di3g== X-Gm-Message-State: AOAM5307zgNOroMW06PQ901l/0ad7bkWI5pKbUMU+lYYnLqKQUnEYJMy tYG5RDwH6hRfg5P7J138Vq8wF9c6pt0= X-Google-Smtp-Source: ABdhPJxh3T2CByrPxU0x0VwYvdWMkwXm0bV1SkaFoboTivfYvAc2IT1vS+/1Q743N1AN5Z/g9DCa/yCPEVI= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a0c:fecd:: with SMTP id z13mr9375738qvs.43.1617301292340; Thu, 01 Apr 2021 11:21:32 -0700 (PDT) Date: Thu, 1 Apr 2021 11:21:22 -0700 In-Reply-To: <20210401182125.171484-1-surenb@google.com> Message-Id: <20210401182125.171484-3-surenb@google.com> Mime-Version: 1.0 References: <20210401182125.171484-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 2/5] mm: do_wp_page() simplification From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, Peter Xu X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6E9AAC0007C0 X-Stat-Signature: ebxh5i9bo3oqqsb459p9jmidrpqq3p7f Received-SPF: none (flex--surenb.bounces.google.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from="<3LA9mYAYKCOsfheRaOTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--surenb.bounces.google.com>"; helo=mail-qt1-f201.google.com; client-ip=209.85.160.201 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617301293-469854 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Linus Torvalds How about we just make sure we're the only possible valid user fo the page before we bother to reuse it? Simplify, simplify, simplify. And get rid of the nasty serialization on the page lock at the same time. [peterx: add subject prefix] Signed-off-by: Linus Torvalds Signed-off-by: Peter Xu Signed-off-by: Linus Torvalds --- mm/memory.c | 58 ++++++++++++++++------------------------------------- 1 file changed, 17 insertions(+), 41 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3874acce1472..d95a4573a273 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2847,49 +2847,25 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * not dirty accountable. */ if (PageAnon(vmf->page)) { - int total_map_swapcount; - if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) || - page_count(vmf->page) != 1)) + struct page *page = vmf->page; + + /* PageKsm() doesn't necessarily raise the page refcount */ + if (PageKsm(page) || page_count(page) != 1) + goto copy; + if (!trylock_page(page)) + goto copy; + if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { + unlock_page(page); goto copy; - if (!trylock_page(vmf->page)) { - get_page(vmf->page); - pte_unmap_unlock(vmf->pte, vmf->ptl); - lock_page(vmf->page); - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); - if (!pte_same(*vmf->pte, vmf->orig_pte)) { - unlock_page(vmf->page); - pte_unmap_unlock(vmf->pte, vmf->ptl); - put_page(vmf->page); - return 0; - } - put_page(vmf->page); - } - if (PageKsm(vmf->page)) { - bool reused = reuse_ksm_page(vmf->page, vmf->vma, - vmf->address); - unlock_page(vmf->page); - if (!reused) - goto copy; - wp_page_reuse(vmf); - return VM_FAULT_WRITE; - } - if (reuse_swap_page(vmf->page, &total_map_swapcount)) { - if (total_map_swapcount == 1) { - /* - * The page is all ours. Move it to - * our anon_vma so the rmap code will - * not search our parent or siblings. - * Protected against the rmap code by - * the page lock. - */ - page_move_anon_rmap(vmf->page, vma); - } - unlock_page(vmf->page); - wp_page_reuse(vmf); - return VM_FAULT_WRITE; } - unlock_page(vmf->page); + /* + * Ok, we've got the only map reference, and the only + * page count reference, and the page is locked, + * it's dark out, and we're wearing sunglasses. Hit it. + */ + wp_page_reuse(vmf); + unlock_page(page); + return VM_FAULT_WRITE; } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED))) { return wp_page_shared(vmf); From patchwork Thu Apr 1 18:21:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 12179165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98FFCC43460 for ; Thu, 1 Apr 2021 18:21:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A9A760201 for ; Thu, 1 Apr 2021 18:21:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A9A760201 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C268A6B007E; Thu, 1 Apr 2021 14:21:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFF016B0080; Thu, 1 Apr 2021 14:21:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AED766B0081; Thu, 1 Apr 2021 14:21:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0213.hostedemail.com [216.40.44.213]) by kanga.kvack.org (Postfix) with ESMTP id 9141F6B007E for ; Thu, 1 Apr 2021 14:21:35 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4BEDEBF10 for ; Thu, 1 Apr 2021 18:21:35 +0000 (UTC) X-FDA: 77984616150.31.EAAF123 Received: from mail-qk1-f202.google.com (mail-qk1-f202.google.com [209.85.222.202]) by imf22.hostedemail.com (Postfix) with ESMTP id 68ABCC0001F7 for ; Thu, 1 Apr 2021 18:21:34 +0000 (UTC) Received: by mail-qk1-f202.google.com with SMTP id v136so4310416qkb.9 for ; Thu, 01 Apr 2021 11:21:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=8Xx3EMbt3lWftC565J98KDcCTdVlINzv28vYEZ4TIC0=; b=CyTw/nT5gMtmomOnKvEQJWZQC7zx7O8RFWGWRxX1tBWJoRo/mtzO45PCK/X6LfVZBz TZj6ALX+DBVMAVBlnTM+hlWEgoJM0Ebb5G+cDixyLdkNAnMyJWoDadrg2W1TXJ8th1OW K8GsDUjTdnEv+/LqBvFA5sS0KivmMPAw/x6BAd4tcpCohJgaIGS0n2YVooRvSsnYW2e2 BpR1BcAhpDqAkqf63CK8ZPeIwwFfr3YbefIcirUMQ2NOSxK6x7ktzhGRAp+bvEk48aF1 sfgIgYMBx4denxOtxjkHtyapTJ3khXVxmZ7xCSYyW1wtcgznxcTYvNjW/Ntjctq20j0k a77A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8Xx3EMbt3lWftC565J98KDcCTdVlINzv28vYEZ4TIC0=; b=eKV/4Xa8sHkY3GheXe9Nz0AOdqWsSeI1njQsqBHaPqADQNXpghK7Bxp6ioquBKbJg8 /Vh/54sfOODd+pNnQ7ghk2CWt1lXm3uxaR9qPjbfX9LlCTv03UFEwx65D+JWZiUL2uLl YCrILqr9ROPYF5JXuKpiWGoZJwtIGOKSogtOBf3Ch0MTe1/s/m9q1dcaTwkDg0PQDN+B UazDGHxiY6brgCAiie+toFyt0pzEoGjDmM6KpGgJiSNRudrE85BK4jvB5P+1rpzVreVZ CJmc4EBc4XGIJ3Y2k/Cg7ohzfOC/UqqNjO7Dg0DolbbndkmF4x58ANEmOVU+/nytrIpB 0IcA== X-Gm-Message-State: AOAM532iSHx8+WK3jgZXukdUccpu3ParPKwNaLIXP2bNKJHCp3DSf0Q9 VjMQaelPRDmM89/RR8yJv23NSlIJ2Ps= X-Google-Smtp-Source: ABdhPJzYZtXxecIdNmcFtDCEDkAl7Gs49jiklL4srXB9l0qyspWnXlQ8pnumnoU7LzSqaXDiu3/eTNejh00= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a05:6214:1c0c:: with SMTP id u12mr9334832qvc.24.1617301294229; Thu, 01 Apr 2021 11:21:34 -0700 (PDT) Date: Thu, 1 Apr 2021 11:21:23 -0700 In-Reply-To: <20210401182125.171484-1-surenb@google.com> Message-Id: <20210401182125.171484-4-surenb@google.com> Mime-Version: 1.0 References: <20210401182125.171484-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 3/5] mm: fix misplaced unlock_page in do_wp_page() From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, Qian Cai , Alex Shi , Gerald Schaefer X-Rspamd-Queue-Id: 68ABCC0001F7 X-Stat-Signature: 8ip3ndaugxzcsjqep4ae71a3juu7bnj8 X-Rspamd-Server: rspam02 Received-SPF: none (flex--surenb.bounces.google.com>: No applicable sender policy available) receiver=imf22; identity=mailfrom; envelope-from="<3Lg9mYAYKCO0hjgTcQVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--surenb.bounces.google.com>"; helo=mail-qk1-f202.google.com; client-ip=209.85.222.202 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617301294-555944 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Linus Torvalds Commit 09854ba94c6a ("mm: do_wp_page() simplification") reorganized all the code around the page re-use vs copy, but in the process also moved the final unlock_page() around to after the wp_page_reuse() call. That normally doesn't matter - but it means that the unlock_page() is now done after releasing the page table lock. Again, not a big deal, you'd think. But it turns out that it's very wrong indeed, because once we've released the page table lock, we've basically lost our only reference to the page - the page tables - and it could now be free'd at any time. We do hold the mmap_sem, so no actual unmap() can happen, but madvise can come in and a MADV_DONTNEED will zap the page range - and free the page. So now the page may be free'd just as we're unlocking it, which in turn will usually trigger a "Bad page state" error in the freeing path. To make matters more confusing, by the time the debug code prints out the page state, the unlock has typically completed and everything looks fine again. This all doesn't happen in any normal situations, but it does trigger with the dirtyc0w_child LTP test. And it seems to trigger much more easily (but not expclusively) on s390 than elsewhere, probably because s390 doesn't do the "batch pages up for freeing after the TLB flush" that gives the unlock_page() more time to complete and makes the race harder to hit. Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Link: https://lore.kernel.org/lkml/a46e9bbef2ed4e17778f5615e818526ef848d791.camel@redhat.com/ Link: https://lore.kernel.org/linux-mm/c41149a8-211e-390b-af1d-d5eee690fecb@linux.alibaba.com/ Reported-by: Qian Cai Reported-by: Alex Shi Bisected-and-analyzed-by: Gerald Schaefer Tested-by: Gerald Schaefer Signed-off-by: Linus Torvalds --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index d95a4573a273..656d90a75cf8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2863,8 +2863,8 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * page count reference, and the page is locked, * it's dark out, and we're wearing sunglasses. Hit it. */ - wp_page_reuse(vmf); unlock_page(page); + wp_page_reuse(vmf); return VM_FAULT_WRITE; } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED))) { From patchwork Thu Apr 1 18:21:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 12179167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-31.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF629C433B4 for ; Thu, 1 Apr 2021 18:21:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6B7016023C for ; Thu, 1 Apr 2021 18:21:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6B7016023C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 090D66B0080; Thu, 1 Apr 2021 14:21:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E966E6B0082; Thu, 1 Apr 2021 14:21:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE9206B0083; Thu, 1 Apr 2021 14:21:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id A5C0F6B0080 for ; Thu, 1 Apr 2021 14:21:37 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5D7521839AD83 for ; Thu, 1 Apr 2021 18:21:37 +0000 (UTC) X-FDA: 77984616234.04.19C2649 Received: from mail-qv1-f74.google.com (mail-qv1-f74.google.com [209.85.219.74]) by imf28.hostedemail.com (Postfix) with ESMTP id A1E032000257 for ; Thu, 1 Apr 2021 18:21:36 +0000 (UTC) Received: by mail-qv1-f74.google.com with SMTP id u15so3854048qvo.13 for ; Thu, 01 Apr 2021 11:21:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Oo4/iLObcqLXLV2/XExPkM5pqEReTYs+R+HS3we8lZ4=; b=tuz14c1aRmcVZA08g3WfHkeWoJDzUVhxIpLw1Vp9J743ol12q2ihLiSrd9dSujmDSe rX3+gUYoS8a5Gn27DtJVlZWNp24bjHifABDq97HEH8DfZxheS5k4MFwk3AabRyBlnNtg F6PzQSLgVplocyN8cqAiDKNdSnVNqGInuJ4SJbVxu1WdTJKA+RCLu4MYbi6Lwg1Ey3In P6HrGt/3oe4VyhsDVuxZkLn4P3uTX3qvhyE9mHmeyjDOOH5Qv73pZ8KUi4ep+mCa/WXZ 6t6fW4wYHuppKajlhNQfsjQoB5z6wNTthH7LT59lhzDrYyyWZ1Ub3doo40630Sbi1X8J vrpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Oo4/iLObcqLXLV2/XExPkM5pqEReTYs+R+HS3we8lZ4=; b=TfAMdMYbDq3uhi8mTj9jxaMBhOunsj1CAOFmMBpKs9QI+nlVNDWUPqpAnXShXRvMvi mHNbUHA5PaX5E4CSl0z0nHnYozShdxKX+RaBAJ/WfvqdXBn+h2WsevFZsd2QuosPV55I 9WFJCJv+nRNHSiDxUL2AYhDwvC49uK6D1PBdX6E8eDGIVU/G+M2FjsDhA5hlnOTrtwNn ZiWbb2e2eGzH9ZZAJU1nElYzqZRiu/ecqTqa6xmqIfnOQNYp7K4+sNZGYLcAnSemuWnd joA6/lUrj3RZ7Co1yZk07ErAKO+3XgFQtqCPGyNsv4yRvbYeq7cWjZyxbDeaGRKRRxZR tibg== X-Gm-Message-State: AOAM531S1Zgqe8zPBCbA5691geu+Wzip7mRhNPfo3Jb4D9X/27N7gAIK lJcI11HmGj+qAg0d2hraffq5SuX5Ydg= X-Google-Smtp-Source: ABdhPJwcg8o/8IRXJYzRz4F/NAPO3qCJYllZtztoX0gkXvKl/3JJgpcO7aVC6xjLgs/5XDWGYjNBR4Am+Do= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a05:6214:5af:: with SMTP id by15mr9621777qvb.37.1617301296257; Thu, 01 Apr 2021 11:21:36 -0700 (PDT) Date: Thu, 1 Apr 2021 11:21:24 -0700 In-Reply-To: <20210401182125.171484-1-surenb@google.com> Message-Id: <20210401182125.171484-5-surenb@google.com> Mime-Version: 1.0 References: <20210401182125.171484-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 4/5] userfaultfd: wp: add helper for writeprotect check From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, Andrea Arcangeli , Peter Xu , Andrew Morton , Jerome Glisse , Mike Rapoport , Rik van Riel , "Kirill A . Shutemov" , Mel Gorman , Hugh Dickins , Johannes Weiner , Bobby Powers , Brian Geffon , David Hildenbrand , Denis Plotnikov , "Dr . David Alan Gilbert" , Martin Cracauer , Marty McFadden , Maya Gokhale , Mike Kravetz , Pavel Emelyanov X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A1E032000257 X-Stat-Signature: ygccmja9bktxmgqhbqru4box1pakt8af Received-SPF: none (flex--surenb.bounces.google.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from="<3MA9mYAYKCO8jliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com>"; helo=mail-qv1-f74.google.com; client-ip=209.85.219.74 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617301296-763504 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Shaohua Li Patch series "userfaultfd: write protection support", v6. Overview ======== The uffd-wp work was initialized by Shaohua Li [1], and later continued by Andrea [2]. This series is based upon Andrea's latest userfaultfd tree, and it is a continuous works from both Shaohua and Andrea. Many of the follow up ideas come from Andrea too. Besides the old MISSING register mode of userfaultfd, the new uffd-wp support provides another alternative register mode called UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing page faults but also write protection page faults, or even they can be registered together. At the same time, the new feature also provides a new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the userspace to write protect a range or memory or fixup write permission of faulted pages. Please refer to the document patch "userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update" for more information on the new interface and what it can do. The major workflow of an uffd-wp program should be: 1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP 2. Write protect part of the whole registered region using UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to show that we want to write protect the range. 3. Start a working thread that modifies the protected pages, meanwhile listening to UFFD messages. 4. When a write is detected upon the protected range, page fault happens, a UFFD message will be generated and reported to the page fault handling thread 5. The page fault handler thread resolves the page fault using the new UFFDIO_WRITEPROTECT ioctl, but this time passing in !UFFDIO_WRITEPROTECT_MODE_WP instead showing that we want to recover the write permission. Before this operation, the fault handler thread can do anything it wants, e.g., dumps the page to a persistent storage. 6. The worker thread will continue running with the correctly applied write permission from step 5. Currently there are already two projects that are based on this new userfaultfd feature. QEMU Live Snapshot: The project provides a way to allow the QEMU hypervisor to take snapshot of VMs without stopping the VM [3]. LLNL umap library: The project provides a mmap-like interface and "allow to have an application specific buffer of pages cached from a large file, i.e. out-of-core execution using memory map" [4][5]. Before posting the patchset, this series was smoke tested against QEMU live snapshot and the LLNL umap library (by doing parallel quicksort using 128 sorting threads + 80 uffd servicing threads). My sincere thanks to Marty Mcfadden and Denis Plotnikov for the help along the way. TODO ==== - hugetlbfs/shmem support - performance - more architectures - cooperate with mprotect()-allowed processes (???) - ... References ========== [1] https://lwn.net/Articles/666187/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=userfault [3] https://github.com/denis-plotnikov/qemu/commits/background-snapshot-kvm [4] https://github.com/LLNL/umap [5] https://llnl-umap.readthedocs.io/en/develop/ [6] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=b245ecf6cf59156966f3da6e6b674f6695a5ffa5 [7] https://lkml.org/lkml/2018/11/21/370 [8] https://lkml.org/lkml/2018/12/30/64 This patch (of 19): Add helper for writeprotect check. Will use it later. Signed-off-by: Shaohua Li Signed-off-by: Andrea Arcangeli Signed-off-by: Peter Xu Signed-off-by: Andrew Morton Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Cc: Rik van Riel Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Hugh Dickins Cc: Johannes Weiner Cc: Bobby Powers Cc: Brian Geffon Cc: David Hildenbrand Cc: Denis Plotnikov Cc: "Dr . David Alan Gilbert" Cc: Martin Cracauer Cc: Marty McFadden Cc: Maya Gokhale Cc: Mike Kravetz Cc: Pavel Emelyanov Link: http://lkml.kernel.org/r/20200220163112.11409-2-peterx@redhat.com Signed-off-by: Linus Torvalds --- include/linux/userfaultfd_k.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 37c9eba75c98..38f748e7186e 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -50,6 +50,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma) return vma->vm_flags & VM_UFFD_MISSING; } +static inline bool userfaultfd_wp(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP); @@ -94,6 +99,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma) return false; } +static inline bool userfaultfd_wp(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; From patchwork Thu Apr 1 18:21:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 12179169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E702AC43462 for ; Thu, 1 Apr 2021 18:21:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8D21060201 for ; Thu, 1 Apr 2021 18:21:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8D21060201 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AA05C6B0082; Thu, 1 Apr 2021 14:21:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A4F366B0087; Thu, 1 Apr 2021 14:21:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 904106B0082; Thu, 1 Apr 2021 14:21:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0140.hostedemail.com [216.40.44.140]) by kanga.kvack.org (Postfix) with ESMTP id 6A1A16B0082 for ; Thu, 1 Apr 2021 14:21:39 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2BD41183A352C for ; Thu, 1 Apr 2021 18:21:39 +0000 (UTC) X-FDA: 77984616318.15.D2A8CFD Received: from mail-qv1-f73.google.com (mail-qv1-f73.google.com [209.85.219.73]) by imf24.hostedemail.com (Postfix) with ESMTP id 22587A00039B for ; Thu, 1 Apr 2021 18:21:37 +0000 (UTC) Received: by mail-qv1-f73.google.com with SMTP id u7so3869203qvf.5 for ; Thu, 01 Apr 2021 11:21:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=1v73NpRb4gNLtIJ6f4iTsLt04NK7xDWlZvvGIJeIyOY=; b=BGhVoWtyVDS11um69GvHWaBxjInp0xc3nZ5igQbDAiVstwD+DwfnURZTAVsHvefgul 54CNcu4PR19AQT/0CY7RzNPbivkK8sBS4gr5l9Y8nIqdzfWJXgPSfyuryUbXs1i117CW ypS9RQXBq/eURlzTCMmK5OT5dgKf2sJM9PYhQwlJ24MLApAGorTD0mAVAvC8SCFmekKR rJ53gDagXqpf+orSrLVUX6slrSoSNg32aHBosaupo8IXy6ULWRPhUqVFRP319BpvKy4q 0jADW5Au3xewFYvz4nt8yaXmmPjjO63lVrGa6MCKaHqCIhu3+iCbTWipUikh7ZnmJxXZ YgBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=1v73NpRb4gNLtIJ6f4iTsLt04NK7xDWlZvvGIJeIyOY=; b=BrZENUW/w0/5+Sr2xkGZaUUcazkv6LiFxS0AC7dqEzFFiCabhSjpr+G24ytVhfIsNr Gg4SyFdZQpikal3lx2ZeLTaGO9abJJmB3F0QFSkRcnI/gujmOfzh+5oWidL9mKl8BSXB QhJ82wmca9s8Ue2kpqOv1IO5Fco1uABTG3XdZTu/xRfy/QSNQ8Kv5YoIxnDOm/qa+LwY WnqSIHCcnI+lvJjQxg86B0cD6H4doqxwlMcMzzZEB1+tUmbDg0Ng7nfqAkq9JbIF2wgz 6LOw23VKKNj4njTu97SRnXbVeHSKE8nBmG+K88S6WPEo4pVUaGAkP5NqpARcANL60Obb iqqA== X-Gm-Message-State: AOAM532S5nNX1beEcWWQQp53dLuEgPlsX9oGxLP8vhwtHDmSamnsTwFi KJN/NlP+zvUfRm89EWRTqU615J0EEBs= X-Google-Smtp-Source: ABdhPJwxEkRxWitj7VWGtbxQbGP2KZXXbWNT/3WR7Y0OIP2YMtheTtde/hB1RIvwzNn28Us9gKp/DzrgxuY= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:ad4:50c7:: with SMTP id e7mr9408265qvq.58.1617301298047; Thu, 01 Apr 2021 11:21:38 -0700 (PDT) Date: Thu, 1 Apr 2021 11:21:25 -0700 In-Reply-To: <20210401182125.171484-1-surenb@google.com> Message-Id: <20210401182125.171484-6-surenb@google.com> Mime-Version: 1.0 References: <20210401182125.171484-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, Yu Zhao , Peter Xu , Andrea Arcangeli , Andy Lutomirski , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Andrew Morton X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 22587A00039B X-Stat-Signature: whcaqna5sndbchirp9i53sztztd9ta5e Received-SPF: none (flex--surenb.bounces.google.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from="<3Mg9mYAYKCPElnkXgUZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--surenb.bounces.google.com>"; helo=mail-qv1-f73.google.com; client-ip=209.85.219.73 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617301297-193911 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 656d90a75cf8..fe6e92de9bec 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2825,6 +2825,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + /* + * Userfaultfd write-protect can defer flushes. Ensure the TLB + * is flushed in this case before copying. + */ + if (unlikely(userfaultfd_wp(vmf->vma) && + mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /*