From patchwork Fri Feb 18 04:10:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12750936 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B43AFC433EF for ; Fri, 18 Feb 2022 04:09:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FB116B0074; Thu, 17 Feb 2022 23:09:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2AA766B0075; Thu, 17 Feb 2022 23:09:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1254F6B0078; Thu, 17 Feb 2022 23:09:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id F19AA6B0074 for ; Thu, 17 Feb 2022 23:09:31 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B5705180ACF6C for ; Fri, 18 Feb 2022 04:09:31 +0000 (UTC) X-FDA: 79154571342.26.B3061FE Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf20.hostedemail.com (Postfix) with ESMTP id 3E4811C0004 for ; Fri, 18 Feb 2022 04:09:31 +0000 (UTC) Received: by mail-pf1-f171.google.com with SMTP id c4so1356459pfl.7 for ; Thu, 17 Feb 2022 20:09:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=yjPe5g9mZg416lZtKmCGXQzG2sOcCDlscD0hM9hw20A=; b=bRsYRdNOq71N4Ft06XdLVOEA2QoG5aA57Qu9l6EFF7EAPjvo+bJT9gHsxDwyjqs3MW SCGV/ia69cVMPdoUDos4ZZrIyyYArtHCu6IIhQN7vl7PVbEEcKfvpaPKFvHwPZ/DXis+ A9rQABwUPinhwB8z66yNFiZ9OlleMg+9kPKkfFH3QHt35uO741EGE0PXh1vbhD/qcUm8 xvylYZXGGShv3iVVL0tVq70LY+7YiiY183IMDENTowGiNNOoztk6SmeXlr6CHvvIpoa2 0k67JIP/k+CAvwNd733mtA2JxMzdnccNJVVKYk6LrqIiCZgD3g3Kf1A5lSxlIGDBwg4v KCxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=yjPe5g9mZg416lZtKmCGXQzG2sOcCDlscD0hM9hw20A=; b=vxPzFoZ3YUzD1xu3tiHp+IjObotlq0xQCBfruPbMtzI2gjlEsOo6sW9vcoqSicymie OfeHRxEhzWlbIVMA4kcfmt/kW6qTitkBjpDXuVMdCrFKScigxJ40A5VIVlBBMg5ZPZqI xv+ulU5MVJQ3RIM2kWIq99mR2QIoQX9DySSC0iGP2jMe1kH72MYVlViEekBz2554Ysxq 4c6Xqo9ycwreo7IuDcGuTBxGEBqVN2UW2TYbf8k2ZMXDVCtICT+D4zWC+TGz6EDrS1uc LWNcTPyJ8mxT8cTJHkDyjDlPPl1okm5MRKGnW2tzv1eVgT0h8VltBy95K58zSoeLdxkh azAg== X-Gm-Message-State: AOAM531rZmFXT2jV+DaonoB1ZyH2Q7Cy90UeDhWgzBqYOZW53joQNi/a Z4uwFXALadmagx6gZn59b/E= X-Google-Smtp-Source: ABdhPJx4nys+/yR7mWKZvP0udjRkp3Fg6Dj/iQt/75bYYbyRevb60/Gz7QAFOb2QjK0RyRZJF2NzEA== X-Received: by 2002:a63:541c:0:b0:34e:4e9b:c370 with SMTP id i28-20020a63541c000000b0034e4e9bc370mr4963947pgb.359.1645157370120; Thu, 17 Feb 2022 20:09:30 -0800 (PST) Received: from sc2-hs2-b1628.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id y23sm1083118pfa.67.2022.02.17.20.09.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Feb 2022 20:09:29 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: Andrew Morton Cc: linux-mm@kvack.org, Nadav Amit , David Hildenbrand , Andrea Arcangeli , Mike Rapoport , Peter Xu , Jan Kara Subject: [PATCH v2] userfaultfd: provide unmasked address on page-fault Date: Fri, 18 Feb 2022 04:10:03 +0000 Message-Id: <20220218041003.3508-1-namit@vmware.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 3E4811C0004 X-Stat-Signature: a3c886obdq3ngesxm6efytj896mnda7b X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=bRsYRdNO; spf=none (imf20.hostedemail.com: domain of mail-pf1-f171.google.com has no SPF policy when checking 209.85.210.171) smtp.helo=mail-pf1-f171.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645157371-486684 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Userfaultfd is supposed to provide the full address (i.e., unmasked) of the faulting access back to userspace. However, that is not the case for quite some time. Even running "userfaultfd_demo" from the userfaultfd man page provides the wrong output (and contradicts the man page). Notice that "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and not the first read address (0x7fc5e30b300f). Address returned by mmap() = 0x7fc5e30b3000 fault_handler_thread(): poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 (uffdio_copy.copy returned 4096) Read address 0x7fc5e30b300f in main(): A Read address 0x7fc5e30b340f in main(): A Read address 0x7fc5e30b380f in main(): A Read address 0x7fc5e30b3c0f in main(): A The exact address is useful for various reasons and specifically for prefetching decisions. If it is known that the memory is populated by certain objects whose size is not page-aligned, then based on the faulting address, the uffd-monitor can decide whether to prefetch and prefault the adjacent page. This bug has been for quite some time in the kernel: since commit 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") vmf->virtual_address"), which dates back to 2016. A concern has been raised that existing userspace application might rely on the old/wrong behavior in which the address is masked. Therefore, it was suggested to provide the masked address unless the user explicitly asks for the exact address. Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct userfaultfd to provide the exact address. Add a new "real_address" field to vmf to hold the unmasked address. Provide the address to userspace accordingly. Cc: David Hildenbrand Cc: Andrea Arcangeli Cc: Mike Rapoport Cc: Peter Xu Cc: Jan Kara Signed-off-by: Nadav Amit Acked-by: Peter Xu Reviewed-by: David Hildenbrand Acked-by: Mike Rapoport --- v1->v2: * Add uffd feature to selectively enable [David, Andrea] --- fs/userfaultfd.c | 5 ++++- include/linux/mm.h | 3 ++- include/uapi/linux/userfaultfd.h | 8 +++++++- mm/memory.c | 1 + 4 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index e26b10132d47..826927026fe7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -198,6 +198,9 @@ static inline struct uffd_msg userfault_msg(unsigned long address, struct uffd_msg msg; msg_init(&msg); msg.event = UFFD_EVENT_PAGEFAULT; + + if (!(features & UFFD_FEATURE_EXACT_ADDRESS)) + address &= PAGE_MASK; msg.arg.pagefault.address = address; /* * These flags indicate why the userfault occurred: @@ -482,7 +485,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function); uwq.wq.private = current; - uwq.msg = userfault_msg(vmf->address, vmf->flags, reason, + uwq.msg = userfault_msg(vmf->real_address, vmf->flags, reason, ctx->features); uwq.ctx = ctx; uwq.waken = false; diff --git a/include/linux/mm.h b/include/linux/mm.h index 213cc569b192..27df0ca0a36a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -478,7 +478,8 @@ struct vm_fault { struct vm_area_struct *vma; /* Target VMA */ gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ - unsigned long address; /* Faulting virtual address */ + unsigned long address; /* Faulting virtual address - masked */ + unsigned long real_address; /* Faulting virtual address - unmaked */ }; enum fault_flag flags; /* FAULT_FLAG_xxx flags * XXX: should really be 'const' */ diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 05b31d60acf6..ef739054cb1c 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -32,7 +32,8 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ - UFFD_FEATURE_MINOR_SHMEM) + UFFD_FEATURE_MINOR_SHMEM | \ + UFFD_FEATURE_EXACT_ADDRESS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -189,6 +190,10 @@ struct uffdio_api { * * UFFD_FEATURE_MINOR_SHMEM indicates the same support as * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead. + * + * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page + * faults would be provided and the offset within the page would not be + * masked. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -201,6 +206,7 @@ struct uffdio_api { #define UFFD_FEATURE_THREAD_ID (1<<8) #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) #define UFFD_FEATURE_MINOR_SHMEM (1<<10) +#define UFFD_FEATURE_EXACT_ADDRESS (1<<11) __u64 features; __u64 ioctls; diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..aae53fde13d9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4622,6 +4622,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, struct vm_fault vmf = { .vma = vma, .address = address & PAGE_MASK, + .real_address = address, .flags = flags, .pgoff = linear_page_index(vma, address), .gfp_mask = __get_fault_gfp_mask(vma),