From patchwork Sat Feb 26 02:26:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12761167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF71CC433EF for ; Sat, 26 Feb 2022 02:26:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FA7F8D0002; Fri, 25 Feb 2022 21:26:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 382CA8D0001; Fri, 25 Feb 2022 21:26:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24AED8D0002; Fri, 25 Feb 2022 21:26:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0118.hostedemail.com [216.40.44.118]) by kanga.kvack.org (Postfix) with ESMTP id 0F6BC8D0001 for ; Fri, 25 Feb 2022 21:26:15 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A80228249980 for ; Sat, 26 Feb 2022 02:26:14 +0000 (UTC) X-FDA: 79183341468.28.1409855 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf19.hostedemail.com (Postfix) with ESMTP id 248001A0004 for ; Sat, 26 Feb 2022 02:26:13 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id g7-20020a17090a708700b001bb78857ccdso9986669pjk.1 for ; Fri, 25 Feb 2022 18:26:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=qScsaNjrNgXBey5fdRYSDGMi2AoZG+PrQRItb9zCWJE=; b=NMc6LeGxnqijslx/7ncSdzXo/KU8IwtTbj5kkgWlRIQxm9zGGM7Pzl/AKCC6pwp4BL wmKeFmZb0bYsbekJeLX3LCkz+ulzdIAQMumDrmUjDDu4fum6PNFLAQumdKfy2h5JSQ5x eOcIoJJU8vo1yOdElAFZNGeg89yopXvBvdG9QalfardJgDzP3EyL3FerOINzqMxWXMKw bt03JojFXd8NVHDAYw0VumbYxAtlb0Zb1QnejSfFcFmB3Ze/czMRRb73GdQaz1gfFy8D JuazhiytNRs7srD7WOuST0miXVO32dnNcYZdp8p5L+6kWNjxOopQ3/S5k+N+LQYq6qwm lfeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=qScsaNjrNgXBey5fdRYSDGMi2AoZG+PrQRItb9zCWJE=; b=GJqLlP5R0nF4EYBQngppOm7RZM7bq3HziF2Wbe2x75Ng+pMAXDnoeagzZkNsKRWwlG 2DDvDWYxJNI4D48f/ZvxJvxLRjQYk9l/P57wtAAaRv+wvj+SmhaL7aFQJl0Hf6ko0dJD FhzuZoSJCxt1Wih6LkrLzNeM51ZS2Ft7p/QXPv/73zttlHrk13TMCLPqtcgehCRklMGK tqS/1NK/9phtmDA3Q8poykmNn8t4bWSUN5ccPRhrY/enR+o9DOOu878aqh8ozpivfrBg imU8tvr+P9R/o5cPIOicR2OboQBJjQFFAyMcOtLmaAt6c8+9OA/hqj+NPWe7i+T8+Or/ 77aw== X-Gm-Message-State: AOAM530wPLsBMG1RkgeH6mlU7zRYCAUBoKnH4IX81cJzVKZ+2k6mPkk2 Kq62DJjFDDaEGOMo629u/M2LFbeVFoo= X-Google-Smtp-Source: ABdhPJxLzUAUFjoKhY+RsbBUfzX2dc3XNvy5/nHAajdhzMojdGCWFhbkLHlVYTSBJ0amEeA5iJsQ0g== X-Received: by 2002:a17:902:8698:b0:151:488f:3dee with SMTP id g24-20020a170902869800b00151488f3deemr1299379plo.9.1645842372588; Fri, 25 Feb 2022 18:26:12 -0800 (PST) Received: from sc2-hs2-b1628.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id w16-20020aa79a10000000b004e1bb6ea986sm4596080pfj.133.2022.02.25.18.26.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 18:26:11 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Peter Xu , David Hildenbrand , Andrea Arcangeli , Mike Rapoport , Jan Kara Subject: [PATCH v3] userfaultfd: provide unmasked address on page-fault Date: Sat, 26 Feb 2022 02:26:55 +0000 Message-Id: <20220226022655.350562-1-namit@vmware.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 248001A0004 X-Stat-Signature: bgwso6q1y3fnr7pwx9x7bct1abniz7hm X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NMc6LeGx; spf=none (imf19.hostedemail.com: domain of mail-pj1-f42.google.com has no SPF policy when checking 209.85.216.42) smtp.helo=mail-pj1-f42.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam07 X-HE-Tag: 1645842373-876983 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Userfaultfd is supposed to provide the full address (i.e., unmasked) of the faulting access back to userspace. However, that is not the case for quite some time. Even running "userfaultfd_demo" from the userfaultfd man page provides the wrong output (and contradicts the man page). Notice that "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and not the first read address (0x7fc5e30b300f). Address returned by mmap() = 0x7fc5e30b3000 fault_handler_thread(): poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 (uffdio_copy.copy returned 4096) Read address 0x7fc5e30b300f in main(): A Read address 0x7fc5e30b340f in main(): A Read address 0x7fc5e30b380f in main(): A Read address 0x7fc5e30b3c0f in main(): A The exact address is useful for various reasons and specifically for prefetching decisions. If it is known that the memory is populated by certain objects whose size is not page-aligned, then based on the faulting address, the uffd-monitor can decide whether to prefetch and prefault the adjacent page. This bug has been for quite some time in the kernel: since commit 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") vmf->virtual_address"), which dates back to 2016. A concern has been raised that existing userspace application might rely on the old/wrong behavior in which the address is masked. Therefore, it was suggested to provide the masked address unless the user explicitly asks for the exact address. Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct userfaultfd to provide the exact address. Add a new "real_address" field to vmf to hold the unmasked address. Provide the address to userspace accordingly. Initialize real_address in various code-paths to be consistent with address, even when it is not used, to be on the safe side. Acked-by: Peter Xu Reviewed-by: David Hildenbrand Cc: Andrea Arcangeli Cc: Mike Rapoport Cc: Jan Kara Signed-off-by: Nadav Amit Acked-by: Mike Rapoport Reviewed-by: Jan Kara Signed-off-by: Nadav Amit Acked-by: Peter Xu --- v2->v3: * Initialize real_address on all code paths [Jan] v1->v2: * Add uffd feature to selectively enable [David, Andrea] --- fs/userfaultfd.c | 5 ++++- include/linux/mm.h | 3 ++- include/uapi/linux/userfaultfd.h | 8 +++++++- mm/hugetlb.c | 6 ++++-- mm/memory.c | 1 + mm/swapfile.c | 1 + 6 files changed, 19 insertions(+), 5 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index e26b10132d47..826927026fe7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -198,6 +198,9 @@ static inline struct uffd_msg userfault_msg(unsigned long address, struct uffd_msg msg; msg_init(&msg); msg.event = UFFD_EVENT_PAGEFAULT; + + if (!(features & UFFD_FEATURE_EXACT_ADDRESS)) + address &= PAGE_MASK; msg.arg.pagefault.address = address; /* * These flags indicate why the userfault occurred: @@ -482,7 +485,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function); uwq.wq.private = current; - uwq.msg = userfault_msg(vmf->address, vmf->flags, reason, + uwq.msg = userfault_msg(vmf->real_address, vmf->flags, reason, ctx->features); uwq.ctx = ctx; uwq.waken = false; diff --git a/include/linux/mm.h b/include/linux/mm.h index 213cc569b192..27df0ca0a36a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -478,7 +478,8 @@ struct vm_fault { struct vm_area_struct *vma; /* Target VMA */ gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ - unsigned long address; /* Faulting virtual address */ + unsigned long address; /* Faulting virtual address - masked */ + unsigned long real_address; /* Faulting virtual address - unmaked */ }; enum fault_flag flags; /* FAULT_FLAG_xxx flags * XXX: should really be 'const' */ diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 05b31d60acf6..ef739054cb1c 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -32,7 +32,8 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ - UFFD_FEATURE_MINOR_SHMEM) + UFFD_FEATURE_MINOR_SHMEM | \ + UFFD_FEATURE_EXACT_ADDRESS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -189,6 +190,10 @@ struct uffdio_api { * * UFFD_FEATURE_MINOR_SHMEM indicates the same support as * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead. + * + * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page + * faults would be provided and the offset within the page would not be + * masked. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -201,6 +206,7 @@ struct uffdio_api { #define UFFD_FEATURE_THREAD_ID (1<<8) #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) #define UFFD_FEATURE_MINOR_SHMEM (1<<10) +#define UFFD_FEATURE_EXACT_ADDRESS (1<<11) __u64 features; __u64 ioctls; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 61895cc01d09..16017f90568b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5342,6 +5342,7 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, pgoff_t idx, unsigned int flags, unsigned long haddr, + unsigned long addr, unsigned long reason) { vm_fault_t ret; @@ -5349,6 +5350,7 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, struct vm_fault vmf = { .vma = vma, .address = haddr, + .real_address = addr, .flags = flags, /* @@ -5417,7 +5419,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, /* Check for page in userfault range */ if (userfaultfd_missing(vma)) { ret = hugetlb_handle_userfault(vma, mapping, idx, - flags, haddr, + flags, haddr, address, VM_UFFD_MISSING); goto out; } @@ -5481,7 +5483,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, unlock_page(page); put_page(page); ret = hugetlb_handle_userfault(vma, mapping, idx, - flags, haddr, + flags, haddr, address, VM_UFFD_MINOR); goto out; } diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..aae53fde13d9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4622,6 +4622,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, struct vm_fault vmf = { .vma = vma, .address = address & PAGE_MASK, + .real_address = address, .flags = flags, .pgoff = linear_page_index(vma, address), .gfp_mask = __get_fault_gfp_mask(vma), diff --git a/mm/swapfile.c b/mm/swapfile.c index bf0df7aa7158..33c7abb16610 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1951,6 +1951,7 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, struct vm_fault vmf = { .vma = vma, .address = addr, + .real_address = addr, .pmd = pmd, };