From patchwork Thu Apr 1 18:17:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 12179121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C407BC43603 for ; Thu, 1 Apr 2021 18:18:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 468E360240 for ; Thu, 1 Apr 2021 18:18:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 468E360240 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4D016B0083; Thu, 1 Apr 2021 14:17:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E24A36B0085; Thu, 1 Apr 2021 14:17:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D146D6B0087; Thu, 1 Apr 2021 14:17:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0253.hostedemail.com [216.40.44.253]) by kanga.kvack.org (Postfix) with ESMTP id B46546B0083 for ; Thu, 1 Apr 2021 14:17:58 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7BD69183A1EFF for ; Thu, 1 Apr 2021 18:17:58 +0000 (UTC) X-FDA: 77984607036.09.496C287 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf16.hostedemail.com (Postfix) with ESMTP id A20D780192C6 for ; Thu, 1 Apr 2021 18:17:57 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id 10so6623457ybx.11 for ; Thu, 01 Apr 2021 11:17:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=/4MVCuA19kT711pBZDOyNdDC4IPaQ3MXmMglGgWunkI=; b=A2MBXKazG6VXHBEacrR4R272Fv2VyyljxTojgnGD0pjSXOqU0JrzTPD8Q7Slu2pox/ Obqr8pS/loTPtzaooVlrHJDXyooaEQtbW2honG/Glnb0yx5DrFy8zuuRhGB1wwD5SDU8 2Og8pI17uKlUk8W0vw9m4G3ftLv05Q/Kk9icZ6ybNaTRoKg5r7VobyDCObIRUW1BMk+e 35bnSo+/RfCzPFAcuin8yQIyi4EYKuGWyQiELmvILO4WrnkMEXLU96sgMLAJFeaeedhg im3KYgMaUB3klgVrnhNv530rX2U/LqAzF8I7k2Ay+H2nRiOUTxWwsf5/Mtfv3fXvmOYa coog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=/4MVCuA19kT711pBZDOyNdDC4IPaQ3MXmMglGgWunkI=; b=Ch7iMIX6zsB2ixfB1tAgSg05ZmvhEWM1xDiPt7Y16HHTjJ3tUzkzID4FNpLyA8uGgb WjhZSuKqH+gamUrVk3i/NLyweIvZi9W7eUe0hFixgmfvqIT1gsClyuNv2LKcLzYgfBrD OtXgGfdQPnTv/yHwauM3Cgb01qRTiud+mCVRkXLX+YItp+ZcA1pmZH2u4dXxip/0fMkx wN5RO/8Jraa35NcE8nToBdztXFicN6JbUP62EeZHjh2SHpTlY8cW8g6F4HnBd6AoJdo1 wwqHa7IIaQcbfCqZkR5dLByvv3P65b9ihVCLbGkmKAOgJIMhVbxe8hf38zRYLh7E0boG zI5A== X-Gm-Message-State: AOAM5309UYyGfNjF3zICj/carB3kLgnV1RmKzksrpWaEPY64OBLLWd4U MjivmaSHDM+wgU2EVgjRoe2yFyU6hhw= X-Google-Smtp-Source: ABdhPJy6sXEMldo5c/KFZxMY9gVy0+e2PTfj3si6bJlA/z3byivX7BFwq9cDAGhHdXEDbMd6hwUD3rtkNwU= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a25:3854:: with SMTP id f81mr13517804yba.466.1617301077360; Thu, 01 Apr 2021 11:17:57 -0700 (PDT) Date: Thu, 1 Apr 2021 11:17:41 -0700 In-Reply-To: <20210401181741.168763-1-surenb@google.com> Message-Id: <20210401181741.168763-6-surenb@google.com> Mime-Version: 1.0 References: <20210401181741.168763-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Yu Zhao , Peter Xu , Andrea Arcangeli , Andy Lutomirski , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Andrew Morton X-Rspamd-Queue-Id: A20D780192C6 X-Stat-Signature: qdo8guroxif6b7qf8fmgk4edcwc1mthj X-Rspamd-Server: rspam02 Received-SPF: none (flex--surenb.bounces.google.com>: No applicable sender policy available) receiver=imf16; identity=mailfrom; envelope-from="<3VQ5mYAYKCBQCEBy7v08805y.w86527EH-664Fuw4.8B0@flex--surenb.bounces.google.com>"; helo=mail-yb1-f201.google.com; client-ip=209.85.219.201 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617301077-159244 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 14470ceaf3f2..3f33651a2a39 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2810,6 +2810,14 @@ static int do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + /* + * Userfaultfd write-protect can defer flushes. Ensure the TLB + * is flushed in this case before copying. + */ + if (unlikely(userfaultfd_wp(vmf->vma) && + mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /*