Message ID | 20220218041003.3508-1-namit@vmware.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] userfaultfd: provide unmasked address on page-fault | expand |
On Fri, Feb 18, 2022 at 04:10:03AM +0000, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > Userfaultfd is supposed to provide the full address (i.e., unmasked) of > the faulting access back to userspace. However, that is not the case for > quite some time. > > Even running "userfaultfd_demo" from the userfaultfd man page provides > the wrong output (and contradicts the man page). Notice that > "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) > and not the first read address (0x7fc5e30b300f). > > Address returned by mmap() = 0x7fc5e30b3000 > > fault_handler_thread(): > poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 > UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 > (uffdio_copy.copy returned 4096) > Read address 0x7fc5e30b300f in main(): A > Read address 0x7fc5e30b340f in main(): A > Read address 0x7fc5e30b380f in main(): A > Read address 0x7fc5e30b3c0f in main(): A > > The exact address is useful for various reasons and specifically for > prefetching decisions. If it is known that the memory is populated by > certain objects whose size is not page-aligned, then based on the > faulting address, the uffd-monitor can decide whether to prefetch and > prefault the adjacent page. > > This bug has been for quite some time in the kernel: since commit > 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") > vmf->virtual_address"), which dates back to 2016. A concern has been > raised that existing userspace application might rely on the old/wrong > behavior in which the address is masked. Therefore, it was suggested to > provide the masked address unless the user explicitly asks for the exact > address. > > Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct > userfaultfd to provide the exact address. Add a new "real_address" field > to vmf to hold the unmasked address. Provide the address to userspace > accordingly. > > Cc: David Hildenbrand <david@redhat.com> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Xu <peterx@redhat.com>
On 18.02.22 05:10, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > Userfaultfd is supposed to provide the full address (i.e., unmasked) of > the faulting access back to userspace. However, that is not the case for > quite some time. > > Even running "userfaultfd_demo" from the userfaultfd man page provides > the wrong output (and contradicts the man page). Notice that > "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) > and not the first read address (0x7fc5e30b300f). > > Address returned by mmap() = 0x7fc5e30b3000 > > fault_handler_thread(): > poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 > UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 > (uffdio_copy.copy returned 4096) > Read address 0x7fc5e30b300f in main(): A > Read address 0x7fc5e30b340f in main(): A > Read address 0x7fc5e30b380f in main(): A > Read address 0x7fc5e30b3c0f in main(): A > > The exact address is useful for various reasons and specifically for > prefetching decisions. If it is known that the memory is populated by > certain objects whose size is not page-aligned, then based on the > faulting address, the uffd-monitor can decide whether to prefetch and > prefault the adjacent page. > > This bug has been for quite some time in the kernel: since commit > 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") > vmf->virtual_address"), which dates back to 2016. A concern has been > raised that existing userspace application might rely on the old/wrong > behavior in which the address is masked. Therefore, it was suggested to > provide the masked address unless the user explicitly asks for the exact > address. > > Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct > userfaultfd to provide the exact address. Add a new "real_address" field > to vmf to hold the unmasked address. Provide the address to userspace > accordingly. > > Cc: David Hildenbrand <david@redhat.com> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Nadav Amit <namit@vmware.com> Thanks Nadav! Reviewed-by: David Hildenbrand <david@redhat.com>
On Fri, Feb 18, 2022 at 04:10:03AM +0000, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > Userfaultfd is supposed to provide the full address (i.e., unmasked) of > the faulting access back to userspace. However, that is not the case for > quite some time. > > Even running "userfaultfd_demo" from the userfaultfd man page provides > the wrong output (and contradicts the man page). Notice that > "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) > and not the first read address (0x7fc5e30b300f). > > Address returned by mmap() = 0x7fc5e30b3000 > > fault_handler_thread(): > poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 > UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 > (uffdio_copy.copy returned 4096) > Read address 0x7fc5e30b300f in main(): A > Read address 0x7fc5e30b340f in main(): A > Read address 0x7fc5e30b380f in main(): A > Read address 0x7fc5e30b3c0f in main(): A > > The exact address is useful for various reasons and specifically for > prefetching decisions. If it is known that the memory is populated by > certain objects whose size is not page-aligned, then based on the > faulting address, the uffd-monitor can decide whether to prefetch and > prefault the adjacent page. > > This bug has been for quite some time in the kernel: since commit > 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") > vmf->virtual_address"), which dates back to 2016. A concern has been > raised that existing userspace application might rely on the old/wrong > behavior in which the address is masked. Therefore, it was suggested to > provide the masked address unless the user explicitly asks for the exact > address. > > Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct > userfaultfd to provide the exact address. Add a new "real_address" field > to vmf to hold the unmasked address. Provide the address to userspace > accordingly. > > Cc: David Hildenbrand <david@redhat.com> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> > > --- > > v1->v2: > * Add uffd feature to selectively enable [David, Andrea] > --- > fs/userfaultfd.c | 5 ++++- > include/linux/mm.h | 3 ++- > include/uapi/linux/userfaultfd.h | 8 +++++++- > mm/memory.c | 1 + > 4 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index e26b10132d47..826927026fe7 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -198,6 +198,9 @@ static inline struct uffd_msg userfault_msg(unsigned long address, > struct uffd_msg msg; > msg_init(&msg); > msg.event = UFFD_EVENT_PAGEFAULT; > + > + if (!(features & UFFD_FEATURE_EXACT_ADDRESS)) > + address &= PAGE_MASK; > msg.arg.pagefault.address = address; > /* > * These flags indicate why the userfault occurred: > @@ -482,7 +485,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function); > uwq.wq.private = current; > - uwq.msg = userfault_msg(vmf->address, vmf->flags, reason, > + uwq.msg = userfault_msg(vmf->real_address, vmf->flags, reason, > ctx->features); > uwq.ctx = ctx; > uwq.waken = false; > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 213cc569b192..27df0ca0a36a 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -478,7 +478,8 @@ struct vm_fault { > struct vm_area_struct *vma; /* Target VMA */ > gfp_t gfp_mask; /* gfp mask to be used for allocations */ > pgoff_t pgoff; /* Logical page offset based on vma */ > - unsigned long address; /* Faulting virtual address */ > + unsigned long address; /* Faulting virtual address - masked */ > + unsigned long real_address; /* Faulting virtual address - unmaked */ > }; > enum fault_flag flags; /* FAULT_FLAG_xxx flags > * XXX: should really be 'const' */ > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h > index 05b31d60acf6..ef739054cb1c 100644 > --- a/include/uapi/linux/userfaultfd.h > +++ b/include/uapi/linux/userfaultfd.h > @@ -32,7 +32,8 @@ > UFFD_FEATURE_SIGBUS | \ > UFFD_FEATURE_THREAD_ID | \ > UFFD_FEATURE_MINOR_HUGETLBFS | \ > - UFFD_FEATURE_MINOR_SHMEM) > + UFFD_FEATURE_MINOR_SHMEM | \ > + UFFD_FEATURE_EXACT_ADDRESS) > #define UFFD_API_IOCTLS \ > ((__u64)1 << _UFFDIO_REGISTER | \ > (__u64)1 << _UFFDIO_UNREGISTER | \ > @@ -189,6 +190,10 @@ struct uffdio_api { > * > * UFFD_FEATURE_MINOR_SHMEM indicates the same support as > * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead. > + * > + * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page > + * faults would be provided and the offset within the page would not be > + * masked. > */ > #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) > #define UFFD_FEATURE_EVENT_FORK (1<<1) > @@ -201,6 +206,7 @@ struct uffdio_api { > #define UFFD_FEATURE_THREAD_ID (1<<8) > #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) > #define UFFD_FEATURE_MINOR_SHMEM (1<<10) > +#define UFFD_FEATURE_EXACT_ADDRESS (1<<11) > __u64 features; > > __u64 ioctls; > diff --git a/mm/memory.c b/mm/memory.c > index c125c4969913..aae53fde13d9 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4622,6 +4622,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > struct vm_fault vmf = { > .vma = vma, > .address = address & PAGE_MASK, > + .real_address = address, > .flags = flags, > .pgoff = linear_page_index(vma, address), > .gfp_mask = __get_fault_gfp_mask(vma), > -- > 2.25.1 >
On Fri, 18 Feb 2022 04:10:03 +0000 Nadav Amit <nadav.amit@gmail.com> wrote: > From: Nadav Amit <namit@vmware.com> > > Userfaultfd is supposed to provide the full address (i.e., unmasked) of > the faulting access back to userspace. However, that is not the case for > quite some time. > > Even running "userfaultfd_demo" from the userfaultfd man page provides > the wrong output (and contradicts the man page). Notice that > "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) > and not the first read address (0x7fc5e30b300f). Well damn. > Address returned by mmap() = 0x7fc5e30b3000 > > fault_handler_thread(): > poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 > UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 > (uffdio_copy.copy returned 4096) > Read address 0x7fc5e30b300f in main(): A > Read address 0x7fc5e30b340f in main(): A > Read address 0x7fc5e30b380f in main(): A > Read address 0x7fc5e30b3c0f in main(): A > > The exact address is useful for various reasons and specifically for > prefetching decisions. If it is known that the memory is populated by > certain objects whose size is not page-aligned, then based on the > faulting address, the uffd-monitor can decide whether to prefetch and > prefault the adjacent page. > > This bug has been for quite some time in the kernel: since commit > 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") > vmf->virtual_address"), which dates back to 2016. A concern has been > raised that existing userspace application might rely on the old/wrong > behavior in which the address is masked. Therefore, it was suggested to > provide the masked address unless the user explicitly asks for the exact > address. > > Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct > userfaultfd to provide the exact address. Add a new "real_address" field > to vmf to hold the unmasked address. Provide the address to userspace > accordingly. Is a manpage update planned?
On Fri 18-02-22 04:10:03, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > Userfaultfd is supposed to provide the full address (i.e., unmasked) of > the faulting access back to userspace. However, that is not the case for > quite some time. > > Even running "userfaultfd_demo" from the userfaultfd man page provides > the wrong output (and contradicts the man page). Notice that > "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) > and not the first read address (0x7fc5e30b300f). > > Address returned by mmap() = 0x7fc5e30b3000 > > fault_handler_thread(): > poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 > UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 > (uffdio_copy.copy returned 4096) > Read address 0x7fc5e30b300f in main(): A > Read address 0x7fc5e30b340f in main(): A > Read address 0x7fc5e30b380f in main(): A > Read address 0x7fc5e30b3c0f in main(): A > > The exact address is useful for various reasons and specifically for > prefetching decisions. If it is known that the memory is populated by > certain objects whose size is not page-aligned, then based on the > faulting address, the uffd-monitor can decide whether to prefetch and > prefault the adjacent page. > > This bug has been for quite some time in the kernel: since commit > 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") > vmf->virtual_address"), which dates back to 2016. A concern has been > raised that existing userspace application might rely on the old/wrong > behavior in which the address is masked. Therefore, it was suggested to > provide the masked address unless the user explicitly asks for the exact > address. > > Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct > userfaultfd to provide the exact address. Add a new "real_address" field > to vmf to hold the unmasked address. Provide the address to userspace > accordingly. > > Cc: David Hildenbrand <david@redhat.com> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Nadav Amit <namit@vmware.com> Yeah, I'm sorry for breaking this :-| The patch looks good except: > diff --git a/mm/memory.c b/mm/memory.c > index c125c4969913..aae53fde13d9 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4622,6 +4622,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > struct vm_fault vmf = { > .vma = vma, > .address = address & PAGE_MASK, > + .real_address = address, > .flags = flags, > .pgoff = linear_page_index(vma, address), > .gfp_mask = __get_fault_gfp_mask(vma), At least mm/hugetlb.c:hugetlb_handle_userfault() also initializes vmf and calls handle_userfault() so it should initialize real_address? Also there are a few other places that initialize vmf but they use vmf only for swapin so probably they don't reach to userfault code. Still it seems a bit fragile to not initialize real_address there? Not strong opinion there... Ideally we would not misuse vmf in those places but that's a larger cleanup. Honza
> On Feb 22, 2022, at 1:00 AM, Jan Kara <jack@suse.cz> wrote: > > On Fri 18-02-22 04:10:03, Nadav Amit wrote: >> From: Nadav Amit <namit@vmware.com> >> >> Userfaultfd is supposed to provide the full address (i.e., unmasked) of >> the faulting access back to userspace. However, that is not the case for >> quite some time. >> >> Even running "userfaultfd_demo" from the userfaultfd man page provides >> the wrong output (and contradicts the man page). Notice that >> "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) >> and not the first read address (0x7fc5e30b300f). >> >> Address returned by mmap() = 0x7fc5e30b3000 >> >> fault_handler_thread(): >> poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 >> UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 >> (uffdio_copy.copy returned 4096) >> Read address 0x7fc5e30b300f in main(): A >> Read address 0x7fc5e30b340f in main(): A >> Read address 0x7fc5e30b380f in main(): A >> Read address 0x7fc5e30b3c0f in main(): A >> >> The exact address is useful for various reasons and specifically for >> prefetching decisions. If it is known that the memory is populated by >> certain objects whose size is not page-aligned, then based on the >> faulting address, the uffd-monitor can decide whether to prefetch and >> prefault the adjacent page. >> >> This bug has been for quite some time in the kernel: since commit >> 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") >> vmf->virtual_address"), which dates back to 2016. A concern has been >> raised that existing userspace application might rely on the old/wrong >> behavior in which the address is masked. Therefore, it was suggested to >> provide the masked address unless the user explicitly asks for the exact >> address. >> >> Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct >> userfaultfd to provide the exact address. Add a new "real_address" field >> to vmf to hold the unmasked address. Provide the address to userspace >> accordingly. >> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Andrea Arcangeli <aarcange@redhat.com> >> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> >> Cc: Peter Xu <peterx@redhat.com> >> Cc: Jan Kara <jack@suse.cz> >> Signed-off-by: Nadav Amit <namit@vmware.com> > > Yeah, I'm sorry for breaking this :-| The patch looks good except: > >> diff --git a/mm/memory.c b/mm/memory.c >> index c125c4969913..aae53fde13d9 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -4622,6 +4622,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, >> struct vm_fault vmf = { >> .vma = vma, >> .address = address & PAGE_MASK, >> + .real_address = address, >> .flags = flags, >> .pgoff = linear_page_index(vma, address), >> .gfp_mask = __get_fault_gfp_mask(vma), > > At least mm/hugetlb.c:hugetlb_handle_userfault() also initializes vmf and > calls handle_userfault() so it should initialize real_address? > > Also there are a few other places that initialize vmf but they use vmf only > for swapin so probably they don't reach to userfault code. Still it seems a > bit fragile to not initialize real_address there? Not strong opinion > there... Ideally we would not misuse vmf in those places but that's a > larger cleanup. Thanks for catching it. I will send v3. So we have: hugetlb_handle_userfault() - will fix. unuse_pte_range() - does not appear to be used for any actual page fault. I will initialize real_address to be on the safe side. __collapse_huge_page_swapin() - another abuse and real_address is not used, but to be on the safe side, I would initialize it. shmem_swapin() - address is zero and not used for any faulting related activity (although it appears to me that you might have the page located on the wrong NUMA node, but it is out of the scope of this patch). I will not change it.
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index e26b10132d47..826927026fe7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -198,6 +198,9 @@ static inline struct uffd_msg userfault_msg(unsigned long address, struct uffd_msg msg; msg_init(&msg); msg.event = UFFD_EVENT_PAGEFAULT; + + if (!(features & UFFD_FEATURE_EXACT_ADDRESS)) + address &= PAGE_MASK; msg.arg.pagefault.address = address; /* * These flags indicate why the userfault occurred: @@ -482,7 +485,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function); uwq.wq.private = current; - uwq.msg = userfault_msg(vmf->address, vmf->flags, reason, + uwq.msg = userfault_msg(vmf->real_address, vmf->flags, reason, ctx->features); uwq.ctx = ctx; uwq.waken = false; diff --git a/include/linux/mm.h b/include/linux/mm.h index 213cc569b192..27df0ca0a36a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -478,7 +478,8 @@ struct vm_fault { struct vm_area_struct *vma; /* Target VMA */ gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ - unsigned long address; /* Faulting virtual address */ + unsigned long address; /* Faulting virtual address - masked */ + unsigned long real_address; /* Faulting virtual address - unmaked */ }; enum fault_flag flags; /* FAULT_FLAG_xxx flags * XXX: should really be 'const' */ diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 05b31d60acf6..ef739054cb1c 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -32,7 +32,8 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ - UFFD_FEATURE_MINOR_SHMEM) + UFFD_FEATURE_MINOR_SHMEM | \ + UFFD_FEATURE_EXACT_ADDRESS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -189,6 +190,10 @@ struct uffdio_api { * * UFFD_FEATURE_MINOR_SHMEM indicates the same support as * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead. + * + * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page + * faults would be provided and the offset within the page would not be + * masked. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -201,6 +206,7 @@ struct uffdio_api { #define UFFD_FEATURE_THREAD_ID (1<<8) #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) #define UFFD_FEATURE_MINOR_SHMEM (1<<10) +#define UFFD_FEATURE_EXACT_ADDRESS (1<<11) __u64 features; __u64 ioctls; diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..aae53fde13d9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4622,6 +4622,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, struct vm_fault vmf = { .vma = vma, .address = address & PAGE_MASK, + .real_address = address, .flags = flags, .pgoff = linear_page_index(vma, address), .gfp_mask = __get_fault_gfp_mask(vma),