From patchwork Thu Mar 2 17:54:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13157710 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64A68C678D4 for ; Thu, 2 Mar 2023 17:54:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCE7E6B0078; Thu, 2 Mar 2023 12:54:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C7E576B007B; Thu, 2 Mar 2023 12:54:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B462D6B007D; Thu, 2 Mar 2023 12:54:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A605F6B0078 for ; Thu, 2 Mar 2023 12:54:33 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 64268405B1 for ; Thu, 2 Mar 2023 17:54:33 +0000 (UTC) X-FDA: 80524708026.06.A252793 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id C1BF480011 for ; Thu, 2 Mar 2023 17:54:30 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ePEUTLxs; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677779671; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=QzwEJy0ykd0+2iEjjzeHmHZle6okB7GtXUTcdZ9p+og=; b=a/8U69VemjV+ARFLzyCt5abdwWkiXx8VXdUkTJC3hxuQEmLnuXOjEaaOZyTPIKumaDUTIZ GRHsiIbTW3y9578KB69SkSRxBlmIqNp+8xrAk75mdKZ0iozCjSJZorndxtNZg+69eHS/nJ KsjCY+gjbiUKTky1G0VOWC1mLGWa78c= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ePEUTLxs; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677779671; a=rsa-sha256; cv=none; b=Hm9G3obHHNfWHcVf0QHtj35Gj/ajPmOmZSjyGc8JsmydUoTj/7VSIqw2k5d/4WbQi8nA7p tabyGUFRfNAc3gHThxfcHb2hJgdEvDDczvGvsG/Ft30gBuuxA3vcCKuRGThNhMus5T+6QV t5bSeP8LvzLDqAbBCmpL/MXC5EI/waA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677779670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=QzwEJy0ykd0+2iEjjzeHmHZle6okB7GtXUTcdZ9p+og=; b=ePEUTLxsvFyRHeyJG6dHGj0FGVdnA3GGSribyBretg26IVbFERQ7Qr9W6A60hJ3prnDZAh yOhpLSAsFgpt77hbfvpoVdc/FojDJhyMbtVuqIG3OMpdhiKLl+1LxJ8ccufdlB3Q6h/sJn MgFndV8V8vTOgD2lwL6CoyzFNLKsccc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-569-eGLh9503PxWADPCgNxdsbg-1; Thu, 02 Mar 2023 12:54:27 -0500 X-MC-Unique: eGLh9503PxWADPCgNxdsbg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 823BC2823809; Thu, 2 Mar 2023 17:54:26 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 727E4140EBF6; Thu, 2 Mar 2023 17:54:24 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Mike Rapoport , Andrea Arcangeli , Peter Xu , Jerome Glisse , Shaohua Li Subject: [PATCH v1] mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage Date: Thu, 2 Mar 2023 18:54:23 +0100 Message-Id: <20230302175423.589164-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C1BF480011 X-Rspam-User: X-Stat-Signature: zzykhjz75cu7bg5e35sm481d1yd5zy38 X-HE-Tag: 1677779670-31322 X-HE-Meta: U2FsdGVkX1+JvuxAtVipIiGt0CZCkVQo9AAfLaxT5gv00mbHEOMHuNVuFtqSs+ADBmbglkwiqf6Rq0Nz4mBpd/B3S9/HWwiFKYfOQ4oyJqyfExbfaPxRHagx+9YST9havbAQOUpfWcrapyUKzXnpFF3qLVT/sbo50kDiJXHTVJEitqF5J9bVSGrau9U2WyATSUFEbkS5luwbwRHiX9jgg1HQEcJSOoCUpSQOmF7XwCxPNCCCJNIWLXtgmPGP3qVrldBi/0YPC40qjhMut+dUMIlsgz51574zclwJimeTwbnHKsNmy3G6/U0759S8DgLYb6OBLMS8BkeOlZweBTR/nxWoGEKzpyGVubptmJ9FRtKk5/DtFPJO3ri4PDYPRpYtX6R+9/B03cMTjCnO/WuqRETgYPAFzEKqHMwPJjlN9Tr35fnJBng449Hko5Il3KGVulisHIxTUoHhhYdMoHyobud/hOqq3p9aCQoQ0Hx9oWO/D+ay+hWz6WIuWVjbWANrvAJ3Lh/sHPnr8g1Y0eg2KMM3RhqojaweAR4XpRmO8lEpHX893jQLfOaYZIxPVS9baUEgC2ntWmbNIWs/A3qxF+WkgDsq3HvbseEx1nztBbA8otP3m6S1roqI5IAYsj/2uRf7UTVsLaXYeQfm3EIctHxZuiBicmowcVJk3+eIwtJ4NoF4iD8UsvUNguyVVpQ6/eq91VixVPWpzEZTa6FN8Rpaoxt19Sx8MqK8tOM7YFl9/wO6uIAkZWs66KZs/efYKOKjn9wAXQUfY7QydcvINLuMpC1g0MQxshETtGHnBVaasY+pGNwAfLRooP75bTnMgq276HEHxlwNeMFGWML0ZETKgwfflvTWPJFqC12jm/1aXYM6GQbxYtW6/7i9gr0enxfMJxjizKbRE/MzLa2DqQkpoj5NakQHp+PIxl2WCR/b1kBmAMWa2iZswK0/4rVeyhaIYORfN2PB40Iqm6d f7Gkwb6v lbv0NlFR9+O4Q0mIMn7zk7BMFAgLFHumwqYAPLgR9RdKDCujM+RN5FMwShlQBttQOo6/+dDh4Cxo02ABjBXPylD9+7i5wJZY6Y9ZQQo2tiZKhzQyNgVLBKA7TR/9GsRDOiALxQxCiwarVl7V2uJUWu8VHTQYAd7kKm94t5dHTZh9xxijyTEaPOqH/4pzRHgs/gz1NVctuf/hY1/CdTf1UAVZMKZ5Ptd/KzxWE3g5i4kHYhtiwGvb95m+jsflOQ3TrPcMMQJsiArwPANoBRQrwdVxiaK2aDyu8JrAStsc2WbmD45F3KcuMW6Ze2ViB0TQZRsuVSBap7LyjPFk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, we'd lose the userfaultfd-wp marker when PTE-mapping a huge zeropage, resulting in the next write faults in the PMD range not triggering uffd-wp events. Various actions (partial MADV_DONTNEED, partial mremap, partial munmap, partial mprotect) could trigger this. However, most importantly, un-protecting a single sub-page from the userfaultfd-wp handler when processing a uffd-wp event will PTE-map the shared huge zeropage and lose the uffd-wp bit for the remainder of the PMD. Let's properly propagate the uffd-wp bit to the PMDs. Acked-by: Peter Xu --- #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #include static size_t pagesize; static int uffd; static volatile bool uffd_triggered; #define barrier() __asm__ __volatile__("": : :"memory") static void uffd_wp_range(char *start, size_t size, bool wp) { struct uffdio_writeprotect uffd_writeprotect; uffd_writeprotect.range.start = (unsigned long) start; uffd_writeprotect.range.len = size; if (wp) { uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP; } else { uffd_writeprotect.mode = 0; } if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) { fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno); exit(1); } } static void *uffd_thread_fn(void *arg) { static struct uffd_msg msg; ssize_t nread; while (1) { struct pollfd pollfd; int nready; pollfd.fd = uffd; pollfd.events = POLLIN; nready = poll(&pollfd, 1, -1); if (nready == -1) { fprintf(stderr, "poll() failed: %d\n", errno); exit(1); } nread = read(uffd, &msg, sizeof(msg)); if (nread <= 0) continue; if (msg.event != UFFD_EVENT_PAGEFAULT || !(msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP)) { printf("FAIL: wrong uffd-wp event fired\n"); exit(1); } /* un-protect the single page. */ uffd_triggered = true; uffd_wp_range((char *)(uintptr_t)msg.arg.pagefault.address, pagesize, false); } return arg; } static int setup_uffd(char *map, size_t size) { struct uffdio_api uffdio_api; struct uffdio_register uffdio_register; pthread_t thread; uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); if (uffd < 0) { fprintf(stderr, "syscall() failed: %d\n", errno); return -errno; } uffdio_api.api = UFFD_API; uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP; if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) { fprintf(stderr, "UFFDIO_API failed: %d\n", errno); return -errno; } if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) { fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n"); return -ENOSYS; } uffdio_register.range.start = (unsigned long) map; uffdio_register.range.len = size; uffdio_register.mode = UFFDIO_REGISTER_MODE_WP; if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) { fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno); return -errno; } pthread_create(&thread, NULL, uffd_thread_fn, NULL); return 0; } int main(void) { const size_t size = 4 * 1024 * 1024ull; char *map, *cur; pagesize = getpagesize(); map = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } if (madvise(map, size, MADV_HUGEPAGE)) { fprintf(stderr, "MADV_HUGEPAGE failed\n"); return -errno; } if (setup_uffd(map, size)) return 1; /* Read the whole range, populating zeropages. */ madvise(map, size, MADV_POPULATE_READ); /* Write-protect the whole range. */ uffd_wp_range(map, size, true); /* Make sure uffd-wp triggers on each page. */ for (cur = map; cur < map + size; cur += pagesize) { uffd_triggered = false; barrier(); /* Trigger a write fault. */ *cur = 1; barrier(); if (!uffd_triggered) { printf("FAIL: uffd-wp did not trigger\n"); return 1; } } printf("PASS: uffd-wp triggered\n"); return 0; } --- Fixes: e06f1e1dd499 ("userfaultfd: wp: enabled write protection in userfaultfd API") Cc: Andrew Morton Cc: Mike Rapoport Cc: Andrea Arcangeli Cc: Peter Xu Cc: Jerome Glisse Cc: Shaohua Li Signed-off-by: David Hildenbrand --- mm/huge_memory.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4fc43859e59a..032fb0ef9cd1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2037,7 +2037,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; - pmd_t _pmd; + pmd_t _pmd, old_pmd; int i; /* @@ -2048,7 +2048,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - pmdp_huge_clear_flush(vma, haddr, pmd); + old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd); pgtable = pgtable_trans_huge_withdraw(mm, pmd); pmd_populate(mm, &_pmd, pgtable); @@ -2057,6 +2057,8 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, pte_t *pte, entry; entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot); entry = pte_mkspecial(entry); + if (pmd_uffd_wp(old_pmd)) + entry = pte_mkuffd_wp(entry); pte = pte_offset_map(&_pmd, haddr); VM_BUG_ON(!pte_none(*pte)); set_pte_at(mm, haddr, pte, entry);