Message ID | 20210511081534.3507-3-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables | expand |
[sorry for a long silence on this] On Tue 11-05-21 10:15:31, David Hildenbrand wrote: [...] Thanks for the extensive usecase description. That is certainly useful background. I am sorry to bring this up again but I am still not convinced that READ/WRITE variant are the best interface. > While the use case for MADV_POPULATE_WRITE is fairly obvious (i.e., > preallocate memory and prefault page tables for VMs), one issue is that > whenever we prefault pages writable, the pages have to be marked dirty, > because the CPU could dirty them any time. while not a real problem for > hugetlbfs or dax/pmem, it can be a problem for shared file mappings: each > page will be marked dirty and has to be written back later when evicting. > > MADV_POPULATE_READ allows for optimizing this scenario: Pre-read a whole > mapping from backend storage without marking it dirty, such that eviction > won't have to write it back. As discussed above, shared file mappings > might require an explciit fallocate() upfront to achieve > preallcoation+prepopulation. This means that you want to have two different uses depending on the underlying mapping type. MADV_POPULATE_READ seems rather weak for anonymous/private mappings. Memory backed by zero pages seems rather unhelpful as the PF would need to do all the heavy lifting anyway. Or is there any actual usecase when this is desirable? So the split into these two modes seems more like gup interface shortcomings bubbling up to the interface. I do expect userspace only cares about pre-faulting the address range. No matter what the backing storage is. Or do I still misunderstand all the usecases?
On 18.05.21 12:07, Michal Hocko wrote: > [sorry for a long silence on this] > > On Tue 11-05-21 10:15:31, David Hildenbrand wrote: > [...] > > Thanks for the extensive usecase description. That is certainly useful > background. I am sorry to bring this up again but I am still not > convinced that READ/WRITE variant are the best interface. Thanks for having time to look into this. > >> While the use case for MADV_POPULATE_WRITE is fairly obvious (i.e., >> preallocate memory and prefault page tables for VMs), one issue is that >> whenever we prefault pages writable, the pages have to be marked dirty, >> because the CPU could dirty them any time. while not a real problem for >> hugetlbfs or dax/pmem, it can be a problem for shared file mappings: each >> page will be marked dirty and has to be written back later when evicting. >> >> MADV_POPULATE_READ allows for optimizing this scenario: Pre-read a whole >> mapping from backend storage without marking it dirty, such that eviction >> won't have to write it back. As discussed above, shared file mappings >> might require an explciit fallocate() upfront to achieve >> preallcoation+prepopulation. > > This means that you want to have two different uses depending on the > underlying mapping type. MADV_POPULATE_READ seems rather weak for > anonymous/private mappings. Memory backed by zero pages seems rather > unhelpful as the PF would need to do all the heavy lifting anyway. > Or is there any actual usecase when this is desirable? Currently, userfaultfd-wp, which requires "some mapping" to be able to arm successfully. In QEMU, we currently have to prefault the shared zeropage for userfaultfd-wp to work as expected. I expect that use case might vanish over time (eventually with new kernels and updated user space), but it might stick for a bit. Apart from that, populating the shared zeropage might be relevant in some corner cases: I remember there are sparse matrix algorithms that operate heavily on the shared zeropage. > > So the split into these two modes seems more like gup interface > shortcomings bubbling up to the interface. I do expect userspace only > cares about pre-faulting the address range. No matter what the backing > storage is. > > Or do I still misunderstand all the usecases? Let me give you an example where we really cannot tell what would be best from a kernel perspective. a) Mapping a file into a VM to be used as RAM. We might expect the guest writing all memory immediately (e.g., booting Windows). We would want MADV_POPULATE_WRITE as we expect a write access immediately. b) Mapping a file into a VM to be used as fake-NVDIMM, for example, ROOTFS or just data storage. We expect mostly reading from this memory, thus, we would want MADV_POPULATE_READ. Instead of trying to be smart in the kernel, I think for this case it makes much more sense to provide user space the options. IMHO it doesn't really hurt to let user space decide on what it thinks is best.
On Tue 18-05-21 12:32:12, David Hildenbrand wrote: > On 18.05.21 12:07, Michal Hocko wrote: > > [sorry for a long silence on this] > > > > On Tue 11-05-21 10:15:31, David Hildenbrand wrote: > > [...] > > > > Thanks for the extensive usecase description. That is certainly useful > > background. I am sorry to bring this up again but I am still not > > convinced that READ/WRITE variant are the best interface. > > Thanks for having time to look into this. > > > > While the use case for MADV_POPULATE_WRITE is fairly obvious (i.e., > > > preallocate memory and prefault page tables for VMs), one issue is that > > > whenever we prefault pages writable, the pages have to be marked dirty, > > > because the CPU could dirty them any time. while not a real problem for > > > hugetlbfs or dax/pmem, it can be a problem for shared file mappings: each > > > page will be marked dirty and has to be written back later when evicting. > > > > > > MADV_POPULATE_READ allows for optimizing this scenario: Pre-read a whole > > > mapping from backend storage without marking it dirty, such that eviction > > > won't have to write it back. As discussed above, shared file mappings > > > might require an explciit fallocate() upfront to achieve > > > preallcoation+prepopulation. > > > > This means that you want to have two different uses depending on the > > underlying mapping type. MADV_POPULATE_READ seems rather weak for > > anonymous/private mappings. Memory backed by zero pages seems rather > > unhelpful as the PF would need to do all the heavy lifting anyway. > > Or is there any actual usecase when this is desirable? > > Currently, userfaultfd-wp, which requires "some mapping" to be able to arm > successfully. In QEMU, we currently have to prefault the shared zeropage for > userfaultfd-wp to work as expected. Just for clarification. The aim is to reduce the memory footprint at the same time, right? If that is really the case then this is worth adding. > I expect that use case might vanish over > time (eventually with new kernels and updated user space), but it might > stick for a bit. Could you elaborate some more please? > Apart from that, populating the shared zeropage might be relevant in some > corner cases: I remember there are sparse matrix algorithms that operate > heavily on the shared zeropage. I am not sure I see why this would be a useful interface for those? Zero page read fault is really low cost. Or are you worried about cummulative overhead by entering the kernel many times? > > So the split into these two modes seems more like gup interface > > shortcomings bubbling up to the interface. I do expect userspace only > > cares about pre-faulting the address range. No matter what the backing > > storage is. > > > > Or do I still misunderstand all the usecases? > > Let me give you an example where we really cannot tell what would be best > from a kernel perspective. > > a) Mapping a file into a VM to be used as RAM. We might expect the guest > writing all memory immediately (e.g., booting Windows). We would want > MADV_POPULATE_WRITE as we expect a write access immediately. > > b) Mapping a file into a VM to be used as fake-NVDIMM, for example, ROOTFS > or just data storage. We expect mostly reading from this memory, thus, we > would want MADV_POPULATE_READ. I am afraid I do not follow. Could you be more explicit about advantages of using those two modes for those example usecases? Is that to share resources (e.g. by not breaking CoW)? > Instead of trying to be smart in the kernel, I think for this case it makes > much more sense to provide user space the options. IMHO it doesn't really > hurt to let user space decide on what it thinks is best. I am mostly worried that this will turn out to be more confusing than helpful. People will need to grasp non trivial concepts and kernel internal implementation details about how read/write faults are handled. Thanks!
>>> This means that you want to have two different uses depending on the >>> underlying mapping type. MADV_POPULATE_READ seems rather weak for >>> anonymous/private mappings. Memory backed by zero pages seems rather >>> unhelpful as the PF would need to do all the heavy lifting anyway. >>> Or is there any actual usecase when this is desirable? >> >> Currently, userfaultfd-wp, which requires "some mapping" to be able to arm >> successfully. In QEMU, we currently have to prefault the shared zeropage for >> userfaultfd-wp to work as expected. > > Just for clarification. The aim is to reduce the memory footprint at the > same time, right? If that is really the case then this is worth adding. Yes. userfaultfd-wp is right now used in QEMU for background snapshotting of VMs. Just because you trigger a background snapshot doesn't mean that you want to COW all pages. (especially, if your VM previously inflated the balloon, was using free page reporting etc.) > >> I expect that use case might vanish over >> time (eventually with new kernels and updated user space), but it might >> stick for a bit. > > Could you elaborate some more please? After I raised that the current behavior of userfaultfd-wp is suboptimal, Peter started working on a userfaultfd-wp mode that doesn't require to prefault all pages just to have it working reliably -- getting notified when any page changes, including ones that haven't been populated yet and would have been populated with the shared zeropage on first access. Not sure what the state of that is and when we might see it. > >> Apart from that, populating the shared zeropage might be relevant in some >> corner cases: I remember there are sparse matrix algorithms that operate >> heavily on the shared zeropage. > > I am not sure I see why this would be a useful interface for those? Zero > page read fault is really low cost. Or are you worried about cummulative > overhead by entering the kernel many times? Yes, cumulative overhead when dealing with large, sparse matrices. Just an example where I think it could be applied in the future -- but not that I consider populating the shared zeropage a really important use case in general (besides for userfaultfd-wp right now). > >>> So the split into these two modes seems more like gup interface >>> shortcomings bubbling up to the interface. I do expect userspace only >>> cares about pre-faulting the address range. No matter what the backing >>> storage is. >>> >>> Or do I still misunderstand all the usecases? >> >> Let me give you an example where we really cannot tell what would be best >> from a kernel perspective. >> >> a) Mapping a file into a VM to be used as RAM. We might expect the guest >> writing all memory immediately (e.g., booting Windows). We would want >> MADV_POPULATE_WRITE as we expect a write access immediately. >> >> b) Mapping a file into a VM to be used as fake-NVDIMM, for example, ROOTFS >> or just data storage. We expect mostly reading from this memory, thus, we >> would want MADV_POPULATE_READ. > > I am afraid I do not follow. Could you be more explicit about advantages > of using those two modes for those example usecases? Is that to share > resources (e.g. by not breaking CoW)? I'm only talking about shared mappings "ordinary files" for now, because that's where MADV_POPULATE_READ vs MADV_POPULATE_WRITE differ in regards of "mark something dirty and write it back"; CoW doesn't apply to shared mappings, it's really just a difference in dirtying and having to write back. For things like PMEM/hugetlbfs/... we usually want MADV_POPULATE_WRITE because then we'd avoid a context switch when our VM actually writes to a page the first time -- and we don't care about dirtying, because we don't have writeback. But again, that's just one use case I have in mind coming from the VM area. I consider MADV_POPULATE_READ really only useful when we are expecting mostly read access on a mapping. (I assume there are other use cases for databases etc. not explored yet where MADV_POPULATE_WRITE would not be desired for performance reasons) > >> Instead of trying to be smart in the kernel, I think for this case it makes >> much more sense to provide user space the options. IMHO it doesn't really >> hurt to let user space decide on what it thinks is best. > > I am mostly worried that this will turn out to be more confusing than > helpful. People will need to grasp non trivial concepts and kernel > internal implementation details about how read/write faults are handled. And that's the point: in the simplest case (without any additional considerations about the underlying mapping), if you end up mostly *reading* MADV_POPULATE_READ is the right thing. If you end up mostly *writing* MADV_POPULATE_WRITE is the right thing. Only care has to be taken when you really want a "prealloction" as in "allocate backend storage" or "don't ever use the shared zeropage". I agree that these details require more knowledge, but so does anything that messes with memory mappings on that level (VMs, databases, ...). QEMU currently implements exactly these two cases manually in user space. Anyhow, please suggest a way to handle it via a single flag in the kernel -- which would be some kind of heuristic as we know from MAP_POPULATE. Having an alternative at hand would make it easier to discuss this topic further. I certainly *don't* want MAP_POPULATE semantics when it comes to MADV_POPULATE, especially when it comes to shared mappings. Not useful in QEMU now and in the future. We could make MADV_POPULATE act depending on the readability/writability of a mapping. Use MADV_POPULATE_WRITE on writable mappings, use MADV_POPULATE_READ on readable mappings. Certainly not perfect for use cases where you have writable mappings that are mostly read only (as in the example with fake-NVDIMMs I gave ...), but if it makes people happy, fine with me. I mostly care about MADV_POPULATE_WRITE.
On Tue 18-05-21 14:03:52, David Hildenbrand wrote: [...] > > > I expect that use case might vanish over > > > time (eventually with new kernels and updated user space), but it might > > > stick for a bit. > > > > Could you elaborate some more please? > > After I raised that the current behavior of userfaultfd-wp is suboptimal, > Peter started working on a userfaultfd-wp mode that doesn't require to > prefault all pages just to have it working reliably -- getting notified when > any page changes, including ones that haven't been populated yet and would > have been populated with the shared zeropage on first access. Not sure what > the state of that is and when we might see it. OK, thanks for the clarification. This suggests that inventing a new interface to cover this usecase doesn't sound like the strongest justification to me. But this doesn't mean this disqualifies it either. > > > Apart from that, populating the shared zeropage might be relevant in some > > > corner cases: I remember there are sparse matrix algorithms that operate > > > heavily on the shared zeropage. > > > > I am not sure I see why this would be a useful interface for those? Zero > > page read fault is really low cost. Or are you worried about cummulative > > overhead by entering the kernel many times? > > Yes, cumulative overhead when dealing with large, sparse matrices. Just an > example where I think it could be applied in the future -- but not that I > consider populating the shared zeropage a really important use case in > general (besides for userfaultfd-wp right now). OK. [...] > Anyhow, please suggest a way to handle it via a single flag in the kernel -- > which would be some kind of heuristic as we know from MAP_POPULATE. Having > an alternative at hand would make it easier to discuss this topic further. I > certainly *don't* want MAP_POPULATE semantics when it comes to > MADV_POPULATE, especially when it comes to shared mappings. Not useful in > QEMU now and in the future. OK, this point is still not entirely clear to me. Elsewhere you are saying that QEMU cannot use MAP_POPULATE because it ignores errors and also it doesn't support sparse mappings because they apply to the whole mmap. These are all clear but it is less clear to me why the same semantic is not applicable for QEMU when used through madvise interface which can handle both of those. Do I get it right that you really want to emulate the full fledged write fault to a) limit another write fault when the content is actually modified and b) prevent from potential errors during the write fault (e.g. mkwrite failing on the fs data)? > We could make MADV_POPULATE act depending on the readability/writability of > a mapping. Use MADV_POPULATE_WRITE on writable mappings, use > MADV_POPULATE_READ on readable mappings. Certainly not perfect for use cases > where you have writable mappings that are mostly read only (as in the > example with fake-NVDIMMs I gave ...), but if it makes people happy, fine > with me. I mostly care about MADV_POPULATE_WRITE. Yes, this is where my thinking was going as well. Essentially define MADV_POPULATE as "Populate the mapping with the memory based on the mapping access." This looks like a straightforward semantic to me and it doesn't really require any deep knowledge of internals. Now, I was trying to compare which of those would be more tricky to understand and use and TBH I am not really convinced any of the two is much better. Separate READ/WRITE modes are explicit which can be good but it will require quite an advanced knowledge of the #PF behavior. On the other hand MADV_POPULATE would require some tricks like mmap, madvise and mprotect(to change to writable) when the data is really written to. I am not sure how much of a deal this would be for QEMU for example. So, all that being said, I am not really sure. I am not really happy about READ/WRITE split but if a simpler interface is going to be a bad fit for existing usecases then I believe a proper way to go is the document the more complex interface thoroughly.
> [...] >> Anyhow, please suggest a way to handle it via a single flag in the kernel -- >> which would be some kind of heuristic as we know from MAP_POPULATE. Having >> an alternative at hand would make it easier to discuss this topic further. I >> certainly *don't* want MAP_POPULATE semantics when it comes to >> MADV_POPULATE, especially when it comes to shared mappings. Not useful in >> QEMU now and in the future. > > OK, this point is still not entirely clear to me. Elsewhere you are > saying that QEMU cannot use MAP_POPULATE because it ignores errors > and also it doesn't support sparse mappings because they apply to the > whole mmap. These are all clear but it is less clear to me why the same > semantic is not applicable for QEMU when used through madvise interface > which can handle both of those. It's a combination of things: a) MAP_POPULATE never was an option simply because of deferred "prealloc=on" handling in QEMU, happening way after we created the memmap. Further it doesn't report if there was an error, which is another reason why it's basically useless for QEMU use cases. b) QEMU uses manual read-write prefaulting for "preallocation", for example, to avoid SIGBUS on hugetlbfs or shmem at runtime. There are cases where we absolutely want to avoid crashing the VM later just because of a user error. MAP_POPULATE does *not* do what we want for shared mappings, because it triggers a read fault. c) QEMU uses the same mechanism for prefaulting in RT environments, where we want to avoid any kind of pagefault, using mlock() etc. d) MAP_POPULATE does not apply to sparse memory mappings that I'll be using more heavily in QEMU, also for the purpose of preallocation with virtio-mem. See the current QEMU code along with a comment in https://github.com/qemu/qemu/blob/972e848b53970d12cb2ca64687ef8ff797fb6236/util/oslib-posix.c#L496 it's especially bad for PMEM ("wear on the storage backing"), which is why we have to trust on users not to trigger preallocation/prefaulting on PMEM, otherwise (as already expressed via bug reports) we waste a lot of time when backing VMs on PMEM or forwarding NVDIMMs, unnecessarily read/writing (slow) DAX. > Do I get it right that you really want to emulate the full fledged write > fault to a) limit another write fault when the content is actually > modified and b) prevent from potential errors during the write fault > (e.g. mkwrite failing on the fs data)? Yes, for the use case of "preallocation" in QEMU. See the QEMU link. But again, the thing that makes it more complicated is that I can come up with some use cases that want to handle "shared mappings of ordinary files" a little better. Or the usefaultfd-wp example I gave, where prefaulting via MADV_POPULATE_READ can roughly half the population time. >> We could make MADV_POPULATE act depending on the readability/writability of >> a mapping. Use MADV_POPULATE_WRITE on writable mappings, use >> MADV_POPULATE_READ on readable mappings. Certainly not perfect for use cases >> where you have writable mappings that are mostly read only (as in the >> example with fake-NVDIMMs I gave ...), but if it makes people happy, fine >> with me. I mostly care about MADV_POPULATE_WRITE. > > Yes, this is where my thinking was going as well. Essentially define > MADV_POPULATE as "Populate the mapping with the memory based on the > mapping access." This looks like a straightforward semantic to me and it > doesn't really require any deep knowledge of internals. > > Now, I was trying to compare which of those would be more tricky to > understand and use and TBH I am not really convinced any of the two is > much better. Separate READ/WRITE modes are explicit which can be good > but it will require quite an advanced knowledge of the #PF behavior. > On the other hand MADV_POPULATE would require some tricks like mmap, > madvise and mprotect(to change to writable) when the data is really > written to. I am not sure how much of a deal this would be for QEMU for > example. IIRC, at the time we enable background snapshotting, the VM is running and we cannot temporarily mprotect(PROT_READ) without making the guest crash. But again, uffd-wp handling is somewhat a special case because the implementation in the kernel is really suboptimal. The reason I chose MADV_POPULATE_READ + MADV_POPULATE_WRITE is because it really mimics what user space currently does to get the job done. I guess the important part to document is that "be careful when using MADV_POPULATE_READ because it might just populate the shared zeropage" and "be careful with MADV_POPULATE_WRITE because it will do the same as when writing to every page: dirty the pages such that they will have to be written back when backed by actual files". The current MAN page entry for MADV_POPULATE_READ reads: " Populate (prefault) page tables readable for the whole range without actually reading. Depending on the underlying mapping, map the shared zeropage, preallocate memory or read the underlying file. Do not generate SIGBUS when populating fails, return an error instead. If MADV_POPULATE_READ succeeds, all page tables have been populated (prefaulted) readable once. If MADV_POPULATE_READ fails, some page tables might have been populated. MADV_POPULATE_READ cannot be applied to mappings without read permissions and special mappings marked with the kernel-internal VM_PFNMAP and VM_IO. Note that with MADV_POPULATE_READ, the process can still be killed at any moment when the system runs out of memory. " > > So, all that being said, I am not really sure. I am not really happy > about READ/WRITE split but if a simpler interface is going to be a bad > fit for existing usecases then I believe a proper way to go is the > document the more complex interface thoroughly. I think with the split we are better off long term without requiring workarounds (mprotect()) to make some use cases work in the long term. But again, if there is a good justification why a single MADV_POPULATE make sense, I'm happy to change it. Again, for me, the most important thing long-term is MADV_POPULATE_WRITE because that's really what QEMU mainly uses right now for preallocation. But I can see use cases for MADV_POPULATE_READ as well. Thanks for your input!
diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index a18ec7f63888..56b4ee5a6c9e 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -71,6 +71,9 @@ #define MADV_COLD 20 /* deactivate these pages */ #define MADV_PAGEOUT 21 /* reclaim these pages */ +#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ +#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index 57dc2ac4f8bd..40b210c65a5a 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -98,6 +98,9 @@ #define MADV_COLD 20 /* deactivate these pages */ #define MADV_PAGEOUT 21 /* reclaim these pages */ +#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ +#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index ab78cba446ed..9e3c010c0f61 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -52,6 +52,9 @@ #define MADV_COLD 20 /* deactivate these pages */ #define MADV_PAGEOUT 21 /* reclaim these pages */ +#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ +#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ + #define MADV_MERGEABLE 65 /* KSM may merge identical pages */ #define MADV_UNMERGEABLE 66 /* KSM may not merge identical pages */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index e5e643752947..b3a22095371b 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -106,6 +106,9 @@ #define MADV_COLD 20 /* deactivate these pages */ #define MADV_PAGEOUT 21 /* reclaim these pages */ +#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ +#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index f94f65d429be..1567a3294c3d 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -72,6 +72,9 @@ #define MADV_COLD 20 /* deactivate these pages */ #define MADV_PAGEOUT 21 /* reclaim these pages */ +#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ +#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/gup.c b/mm/gup.c index ef7d2da9f03f..632d12469deb 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1403,6 +1403,64 @@ long populate_vma_page_range(struct vm_area_struct *vma, NULL, NULL, locked); } +/* + * faultin_vma_page_range() - populate (prefault) page tables inside the + * given VMA range readable/writable + * + * This takes care of mlocking the pages, too, if VM_LOCKED is set. + * + * @vma: target vma + * @start: start address + * @end: end address + * @write: whether to prefault readable or writable + * @locked: whether the mmap_lock is still held + * + * Returns either number of processed pages in the vma, or a negative error + * code on error (see __get_user_pages()). + * + * vma->vm_mm->mmap_lock must be held. The range must be page-aligned and + * covered by the VMA. + * + * If @locked is NULL, it may be held for read or write and will be unperturbed. + * + * If @locked is non-NULL, it must held for read only and may be released. If + * it's released, *@locked will be set to 0. + */ +long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start, + unsigned long end, bool write, int *locked) +{ + struct mm_struct *mm = vma->vm_mm; + unsigned long nr_pages = (end - start) / PAGE_SIZE; + int gup_flags; + + VM_BUG_ON(!PAGE_ALIGNED(start)); + VM_BUG_ON(!PAGE_ALIGNED(end)); + VM_BUG_ON_VMA(start < vma->vm_start, vma); + VM_BUG_ON_VMA(end > vma->vm_end, vma); + mmap_assert_locked(mm); + + /* + * FOLL_TOUCH: Mark page accessed and thereby young; will also mark + * the page dirty with FOLL_WRITE -- which doesn't make a + * difference with !FOLL_FORCE, because the page is writable + * in the page table. + * FOLL_HWPOISON: Return -EHWPOISON instead of -EFAULT when we hit + * a poisoned page. + * FOLL_POPULATE: Always populate memory with VM_LOCKONFAULT. + * !FOLL_FORCE: Require proper access permissions. + */ + gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK | FOLL_HWPOISON; + if (write) + gup_flags |= FOLL_WRITE; + + /* + * See check_vma_flags(): Will return -EFAULT on incompatible mappings + * or with insufficient permissions. + */ + return __get_user_pages(mm, start, nr_pages, gup_flags, + NULL, NULL, locked); +} + /* * __mm_populate - populate and/or mlock pages within a range of address space. * diff --git a/mm/internal.h b/mm/internal.h index bbf1c1274983..41e8d41a5d1e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -355,6 +355,9 @@ void __vma_unlink_list(struct mm_struct *mm, struct vm_area_struct *vma); #ifdef CONFIG_MMU extern long populate_vma_page_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, int *locked); +extern long faultin_vma_page_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end, + bool write, int *locked); extern void munlock_vma_pages_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); static inline void munlock_vma_pages_all(struct vm_area_struct *vma) diff --git a/mm/madvise.c b/mm/madvise.c index 01fef79ac761..a02cbda942ba 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -53,6 +53,8 @@ static int madvise_need_mmap_write(int behavior) case MADV_COLD: case MADV_PAGEOUT: case MADV_FREE: + case MADV_POPULATE_READ: + case MADV_POPULATE_WRITE: return 0; default: /* be safe, default to 1. list exceptions explicitly */ @@ -822,6 +824,61 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, return -EINVAL; } +static long madvise_populate(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end, + int behavior) +{ + const bool write = behavior == MADV_POPULATE_WRITE; + struct mm_struct *mm = vma->vm_mm; + unsigned long tmp_end; + int locked = 1; + long pages; + + *prev = vma; + + while (start < end) { + /* + * We might have temporarily dropped the lock. For example, + * our VMA might have been split. + */ + if (!vma || start >= vma->vm_end) { + vma = find_vma(mm, start); + if (!vma || start < vma->vm_start) + return -ENOMEM; + } + + tmp_end = min_t(unsigned long, end, vma->vm_end); + /* Populate (prefault) page tables readable/writable. */ + pages = faultin_vma_page_range(vma, start, tmp_end, write, + &locked); + if (!locked) { + mmap_read_lock(mm); + locked = 1; + *prev = NULL; + vma = NULL; + } + if (pages < 0) { + switch (pages) { + case -EINTR: + return -EINTR; + case -EFAULT: /* Incompatible mappings / permissions. */ + return -EINVAL; + case -EHWPOISON: + return -EHWPOISON; + default: + pr_warn_once("%s: unhandled return value: %ld\n", + __func__, pages); + fallthrough; + case -ENOMEM: + return -ENOMEM; + } + } + start += pages * PAGE_SIZE; + } + return 0; +} + /* * Application wants to free up the pages and associated backing store. * This is effectively punching a hole into the middle of a file. @@ -935,6 +992,9 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); + case MADV_POPULATE_READ: + case MADV_POPULATE_WRITE: + return madvise_populate(vma, prev, start, end, behavior); default: return madvise_behavior(vma, prev, start, end, behavior); } @@ -955,6 +1015,8 @@ madvise_behavior_valid(int behavior) case MADV_FREE: case MADV_COLD: case MADV_PAGEOUT: + case MADV_POPULATE_READ: + case MADV_POPULATE_WRITE: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: @@ -1042,6 +1104,10 @@ process_madvise_behavior_valid(int behavior) * easily if memory pressure hanppens. * MADV_PAGEOUT - the application is not expected to use this memory soon, * page out the pages in this range immediately. + * MADV_POPULATE_READ - populate (prefault) page tables readable by + * triggering read faults if required + * MADV_POPULATE_WRITE - populate (prefault) page tables writable by + * triggering write faults if required * * return values: * zero - success
I. Background: Sparse Memory Mappings When we manage sparse memory mappings dynamically in user space - also sometimes involving MAP_NORESERVE - we want to dynamically populate/ discard memory inside such a sparse memory region. Example users are hypervisors (especially implementing memory ballooning or similar technologies like virtio-mem) and memory allocators. In addition, we want to fail in a nice way (instead of generating SIGBUS) if populating does not succeed because we are out of backend memory (which can happen easily with file-based mappings, especially tmpfs and hugetlbfs). While MADV_DONTNEED, MADV_REMOVE and FALLOC_FL_PUNCH_HOLE allow for reliably discarding memory for most mapping types, there is no generic approach to populate page tables and preallocate memory. Although mmap() supports MAP_POPULATE, it is not applicable to the concept of sparse memory mappings, where we want to populate/discard dynamically and avoid expensive/problematic remappings. In addition, we never actually report errors during the final populate phase - it is best-effort only. fallocate() can be used to preallocate file-based memory and fail in a safe way. However, it cannot really be used for any private mappings on anonymous files via memfd due to COW semantics. In addition, fallocate() does not actually populate page tables, so we still always get pagefaults on first access - which is sometimes undesired (i.e., real-time workloads) and requires real prefaulting of page tables, not just a preallocation of backend storage. There might be interesting use cases for sparse memory regions along with mlockall(MCL_ONFAULT) which fallocate() cannot satisfy as it does not prefault page tables. II. On preallcoation/prefaulting from user space Because we don't have a proper interface, what applications (like QEMU and databases) end up doing is touching (i.e., reading+writing one byte to not overwrite existing data) all individual pages. However, that approach 1) Can result in wear on storage backing, because we end up reading/writing each page; this is especially a problem for dax/pmem. 2) Can result in mmap_sem contention when prefaulting via multiple threads. 3) Requires expensive signal handling, especially to catch SIGBUS in case of hugetlbfs/shmem/file-backed memory. For example, this is problematic in hypervisors like QEMU where SIGBUS handlers might already be used by other subsystems concurrently to e.g, handle hardware errors. "Simply" doing preallocation concurrently from other thread is not that easy. III. On MADV_WILLNEED Extending MADV_WILLNEED is not an option because 1. It would change the semantics: "Expect access in the near future." and "might be a good idea to read some pages" vs. "Definitely populate/ preallocate all memory and definitely fail on errors.". 2. Existing users (like virtio-balloon in QEMU when deflating the balloon) don't want populate/prealloc semantics. They treat this rather as a hint to give a little performance boost without too much overhead - and don't expect that a lot of memory might get consumed or a lot of time might be spent. IV. MADV_POPULATE_READ and MADV_POPULATE_WRITE Let's introduce MADV_POPULATE_READ and MADV_POPULATE_WRITE, inspired by MAP_POPULATE, with the following semantics: 1. MADV_POPULATE_READ can be used to prefault page tables just like manually reading each individual page. This will not break any COW mappings. The shared zero page might get mapped and no backend storage might get preallocated -- allocation might be deferred to write-fault time. Especially shared file mappings require an explicit fallocate() upfront to actually preallocate backend memory (blocks in the file system) in case the file might have holes. 2. If MADV_POPULATE_READ succeeds, all page tables have been populated (prefaulted) readable once. 3. MADV_POPULATE_WRITE can be used to preallocate backend memory and prefault page tables just like manually writing (or reading+writing) each individual page. This will break any COW mappings -- e.g., the shared zeropage is never populated. 4. If MADV_POPULATE_WRITE succeeds, all page tables have been populated (prefaulted) writable once. 5. MADV_POPULATE_READ and MADV_POPULATE_WRITE cannot be applied to special mappings marked with VM_PFNMAP and VM_IO. Also, proper access permissions (e.g., PROT_READ, PROT_WRITE) are required. If any such mapping is encountered, madvise() fails with -EINVAL. 6. If MADV_POPULATE_READ or MADV_POPULATE_WRITE fails, some page tables might have been populated. 7. MADV_POPULATE_READ and MADV_POPULATE_WRITE will return -EHWPOISON when encountering a HW poisoned page in the range. 8. Similar to MAP_POPULATE, MADV_POPULATE_READ and MADV_POPULATE_WRITE cannot protect from the OOM (Out Of Memory) handler killing the process. While the use case for MADV_POPULATE_WRITE is fairly obvious (i.e., preallocate memory and prefault page tables for VMs), one issue is that whenever we prefault pages writable, the pages have to be marked dirty, because the CPU could dirty them any time. while not a real problem for hugetlbfs or dax/pmem, it can be a problem for shared file mappings: each page will be marked dirty and has to be written back later when evicting. MADV_POPULATE_READ allows for optimizing this scenario: Pre-read a whole mapping from backend storage without marking it dirty, such that eviction won't have to write it back. As discussed above, shared file mappings might require an explciit fallocate() upfront to achieve preallcoation+prepopulation. Although sparse memory mappings are the primary use case, this will also be useful for other preallocate/prefault use cases where MAP_POPULATE is not desired or the semantics of MAP_POPULATE are not sufficient: as one example, QEMU users can trigger preallocation/prefaulting of guest RAM after the mapping was created -- and don't want errors to be silently suppressed. Looking at the history, MADV_POPULATE was already proposed in 2013 [1], however, the main motivation back than was performance improvements -- which should also still be the case. V. Single-threaded performance comparison I did a short experiment, prefaulting page tables on completely *empty mappings/files* and repeated the experiment 10 times. The results correspond to the shortest execution time. In general, the performance benefit for huge pages is negligible with small mappings. V.1: Private mappings POPULATE_READ and POPULATE_WRITE is fastest. Note that Reading/POPULATE_READ will populate the shared zeropage where applicable -- which result in short population times. The fastest way to allocate backend storage (here: swap or huge pages) and prefault page tables is POPULATE_WRITE. V.2: Shared mappings fallocate() is fastest, however, doesn't prefault page tables. POPULATE_WRITE is faster than simple writes and read/writes. POPULATE_READ is faster than simple reads. Without a fd, the fastest way to allocate backend storage and prefault page tables is POPULATE_WRITE. With an fd, the fastest way is usually FALLOCATE+POPULATE_READ or FALLOCATE+POPULATE_WRITE respectively; one exception are actual files: FALLOCATE+Read is slightly faster than FALLOCATE+POPULATE_READ. The fastest way to allocate backend storage prefault page tables is FALLOCATE+POPULATE_WRITE -- except when dealing with actual files; then, FALLOCATE+POPULATE_READ is fastest and won't directly mark all pages as dirty. v.3: Detailed results ================================================== 2 MiB MAP_PRIVATE: ************************************************** Anon 4 KiB : Read : 0.119 ms Anon 4 KiB : Write : 0.222 ms Anon 4 KiB : Read/Write : 0.380 ms Anon 4 KiB : POPULATE_READ : 0.060 ms Anon 4 KiB : POPULATE_WRITE : 0.158 ms Memfd 4 KiB : Read : 0.034 ms Memfd 4 KiB : Write : 0.310 ms Memfd 4 KiB : Read/Write : 0.362 ms Memfd 4 KiB : POPULATE_READ : 0.039 ms Memfd 4 KiB : POPULATE_WRITE : 0.229 ms Memfd 2 MiB : Read : 0.030 ms Memfd 2 MiB : Write : 0.030 ms Memfd 2 MiB : Read/Write : 0.030 ms Memfd 2 MiB : POPULATE_READ : 0.030 ms Memfd 2 MiB : POPULATE_WRITE : 0.030 ms tmpfs : Read : 0.033 ms tmpfs : Write : 0.313 ms tmpfs : Read/Write : 0.406 ms tmpfs : POPULATE_READ : 0.039 ms tmpfs : POPULATE_WRITE : 0.285 ms file : Read : 0.033 ms file : Write : 0.351 ms file : Read/Write : 0.408 ms file : POPULATE_READ : 0.039 ms file : POPULATE_WRITE : 0.290 ms hugetlbfs : Read : 0.030 ms hugetlbfs : Write : 0.030 ms hugetlbfs : Read/Write : 0.030 ms hugetlbfs : POPULATE_READ : 0.030 ms hugetlbfs : POPULATE_WRITE : 0.030 ms ************************************************** 4096 MiB MAP_PRIVATE: ************************************************** Anon 4 KiB : Read : 237.940 ms Anon 4 KiB : Write : 708.409 ms Anon 4 KiB : Read/Write : 1054.041 ms Anon 4 KiB : POPULATE_READ : 124.310 ms Anon 4 KiB : POPULATE_WRITE : 572.582 ms Memfd 4 KiB : Read : 136.928 ms Memfd 4 KiB : Write : 963.898 ms Memfd 4 KiB : Read/Write : 1106.561 ms Memfd 4 KiB : POPULATE_READ : 78.450 ms Memfd 4 KiB : POPULATE_WRITE : 805.881 ms Memfd 2 MiB : Read : 357.116 ms Memfd 2 MiB : Write : 357.210 ms Memfd 2 MiB : Read/Write : 357.606 ms Memfd 2 MiB : POPULATE_READ : 356.094 ms Memfd 2 MiB : POPULATE_WRITE : 356.937 ms tmpfs : Read : 137.536 ms tmpfs : Write : 954.362 ms tmpfs : Read/Write : 1105.954 ms tmpfs : POPULATE_READ : 80.289 ms tmpfs : POPULATE_WRITE : 822.826 ms file : Read : 137.874 ms file : Write : 987.025 ms file : Read/Write : 1107.439 ms file : POPULATE_READ : 80.413 ms file : POPULATE_WRITE : 857.622 ms hugetlbfs : Read : 355.607 ms hugetlbfs : Write : 355.729 ms hugetlbfs : Read/Write : 356.127 ms hugetlbfs : POPULATE_READ : 354.585 ms hugetlbfs : POPULATE_WRITE : 355.138 ms ************************************************** 2 MiB MAP_SHARED: ************************************************** Anon 4 KiB : Read : 0.394 ms Anon 4 KiB : Write : 0.348 ms Anon 4 KiB : Read/Write : 0.400 ms Anon 4 KiB : POPULATE_READ : 0.326 ms Anon 4 KiB : POPULATE_WRITE : 0.273 ms Anon 2 MiB : Read : 0.030 ms Anon 2 MiB : Write : 0.030 ms Anon 2 MiB : Read/Write : 0.030 ms Anon 2 MiB : POPULATE_READ : 0.030 ms Anon 2 MiB : POPULATE_WRITE : 0.030 ms Memfd 4 KiB : Read : 0.412 ms Memfd 4 KiB : Write : 0.372 ms Memfd 4 KiB : Read/Write : 0.419 ms Memfd 4 KiB : POPULATE_READ : 0.343 ms Memfd 4 KiB : POPULATE_WRITE : 0.288 ms Memfd 4 KiB : FALLOCATE : 0.137 ms Memfd 4 KiB : FALLOCATE+Read : 0.446 ms Memfd 4 KiB : FALLOCATE+Write : 0.330 ms Memfd 4 KiB : FALLOCATE+Read/Write : 0.454 ms Memfd 4 KiB : FALLOCATE+POPULATE_READ : 0.379 ms Memfd 4 KiB : FALLOCATE+POPULATE_WRITE : 0.268 ms Memfd 2 MiB : Read : 0.030 ms Memfd 2 MiB : Write : 0.030 ms Memfd 2 MiB : Read/Write : 0.030 ms Memfd 2 MiB : POPULATE_READ : 0.030 ms Memfd 2 MiB : POPULATE_WRITE : 0.030 ms Memfd 2 MiB : FALLOCATE : 0.030 ms Memfd 2 MiB : FALLOCATE+Read : 0.031 ms Memfd 2 MiB : FALLOCATE+Write : 0.031 ms Memfd 2 MiB : FALLOCATE+Read/Write : 0.031 ms Memfd 2 MiB : FALLOCATE+POPULATE_READ : 0.030 ms Memfd 2 MiB : FALLOCATE+POPULATE_WRITE : 0.030 ms tmpfs : Read : 0.416 ms tmpfs : Write : 0.369 ms tmpfs : Read/Write : 0.425 ms tmpfs : POPULATE_READ : 0.346 ms tmpfs : POPULATE_WRITE : 0.295 ms tmpfs : FALLOCATE : 0.139 ms tmpfs : FALLOCATE+Read : 0.447 ms tmpfs : FALLOCATE+Write : 0.333 ms tmpfs : FALLOCATE+Read/Write : 0.454 ms tmpfs : FALLOCATE+POPULATE_READ : 0.380 ms tmpfs : FALLOCATE+POPULATE_WRITE : 0.272 ms file : Read : 0.191 ms file : Write : 0.511 ms file : Read/Write : 0.524 ms file : POPULATE_READ : 0.196 ms file : POPULATE_WRITE : 0.434 ms file : FALLOCATE : 0.004 ms file : FALLOCATE+Read : 0.197 ms file : FALLOCATE+Write : 0.554 ms file : FALLOCATE+Read/Write : 0.480 ms file : FALLOCATE+POPULATE_READ : 0.201 ms file : FALLOCATE+POPULATE_WRITE : 0.381 ms hugetlbfs : Read : 0.030 ms hugetlbfs : Write : 0.030 ms hugetlbfs : Read/Write : 0.030 ms hugetlbfs : POPULATE_READ : 0.030 ms hugetlbfs : POPULATE_WRITE : 0.030 ms hugetlbfs : FALLOCATE : 0.030 ms hugetlbfs : FALLOCATE+Read : 0.031 ms hugetlbfs : FALLOCATE+Write : 0.031 ms hugetlbfs : FALLOCATE+Read/Write : 0.030 ms hugetlbfs : FALLOCATE+POPULATE_READ : 0.030 ms hugetlbfs : FALLOCATE+POPULATE_WRITE : 0.030 ms ************************************************** 4096 MiB MAP_SHARED: ************************************************** Anon 4 KiB : Read : 1053.090 ms Anon 4 KiB : Write : 913.642 ms Anon 4 KiB : Read/Write : 1060.350 ms Anon 4 KiB : POPULATE_READ : 893.691 ms Anon 4 KiB : POPULATE_WRITE : 782.885 ms Anon 2 MiB : Read : 358.553 ms Anon 2 MiB : Write : 358.419 ms Anon 2 MiB : Read/Write : 357.992 ms Anon 2 MiB : POPULATE_READ : 357.533 ms Anon 2 MiB : POPULATE_WRITE : 357.808 ms Memfd 4 KiB : Read : 1078.144 ms Memfd 4 KiB : Write : 942.036 ms Memfd 4 KiB : Read/Write : 1100.391 ms Memfd 4 KiB : POPULATE_READ : 925.829 ms Memfd 4 KiB : POPULATE_WRITE : 804.394 ms Memfd 4 KiB : FALLOCATE : 304.632 ms Memfd 4 KiB : FALLOCATE+Read : 1163.359 ms Memfd 4 KiB : FALLOCATE+Write : 933.186 ms Memfd 4 KiB : FALLOCATE+Read/Write : 1187.304 ms Memfd 4 KiB : FALLOCATE+POPULATE_READ : 1013.660 ms Memfd 4 KiB : FALLOCATE+POPULATE_WRITE : 794.560 ms Memfd 2 MiB : Read : 358.131 ms Memfd 2 MiB : Write : 358.099 ms Memfd 2 MiB : Read/Write : 358.250 ms Memfd 2 MiB : POPULATE_READ : 357.563 ms Memfd 2 MiB : POPULATE_WRITE : 357.334 ms Memfd 2 MiB : FALLOCATE : 356.735 ms Memfd 2 MiB : FALLOCATE+Read : 358.152 ms Memfd 2 MiB : FALLOCATE+Write : 358.331 ms Memfd 2 MiB : FALLOCATE+Read/Write : 358.018 ms Memfd 2 MiB : FALLOCATE+POPULATE_READ : 357.286 ms Memfd 2 MiB : FALLOCATE+POPULATE_WRITE : 357.523 ms tmpfs : Read : 1087.265 ms tmpfs : Write : 950.840 ms tmpfs : Read/Write : 1107.567 ms tmpfs : POPULATE_READ : 922.605 ms tmpfs : POPULATE_WRITE : 810.094 ms tmpfs : FALLOCATE : 306.320 ms tmpfs : FALLOCATE+Read : 1169.796 ms tmpfs : FALLOCATE+Write : 933.730 ms tmpfs : FALLOCATE+Read/Write : 1191.610 ms tmpfs : FALLOCATE+POPULATE_READ : 1020.474 ms tmpfs : FALLOCATE+POPULATE_WRITE : 798.945 ms file : Read : 654.101 ms file : Write : 1259.142 ms file : Read/Write : 1289.509 ms file : POPULATE_READ : 661.642 ms file : POPULATE_WRITE : 1106.816 ms file : FALLOCATE : 1.864 ms file : FALLOCATE+Read : 656.328 ms file : FALLOCATE+Write : 1153.300 ms file : FALLOCATE+Read/Write : 1180.613 ms file : FALLOCATE+POPULATE_READ : 668.347 ms file : FALLOCATE+POPULATE_WRITE : 996.143 ms hugetlbfs : Read : 357.245 ms hugetlbfs : Write : 357.413 ms hugetlbfs : Read/Write : 357.120 ms hugetlbfs : POPULATE_READ : 356.321 ms hugetlbfs : POPULATE_WRITE : 356.693 ms hugetlbfs : FALLOCATE : 355.927 ms hugetlbfs : FALLOCATE+Read : 357.074 ms hugetlbfs : FALLOCATE+Write : 357.120 ms hugetlbfs : FALLOCATE+Read/Write : 356.983 ms hugetlbfs : FALLOCATE+POPULATE_READ : 356.413 ms hugetlbfs : FALLOCATE+POPULATE_WRITE : 356.266 ms ************************************************** [1] https://lkml.org/lkml/2013/6/27/698 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rolf Eike Beer <eike-kernel@sf-tec.de> Cc: linux-alpha@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Cc: Linux API <linux-api@vger.kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> --- arch/alpha/include/uapi/asm/mman.h | 3 ++ arch/mips/include/uapi/asm/mman.h | 3 ++ arch/parisc/include/uapi/asm/mman.h | 3 ++ arch/xtensa/include/uapi/asm/mman.h | 3 ++ include/uapi/asm-generic/mman-common.h | 3 ++ mm/gup.c | 58 ++++++++++++++++++++++ mm/internal.h | 3 ++ mm/madvise.c | 66 ++++++++++++++++++++++++++ 8 files changed, 142 insertions(+)