Message ID | 20200123014627.71720-1-bgeffon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Add MREMAP_DONTUNMAP to mremap(). | expand |
> On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote: > > MREMAP_DONTUNMAP is an additional flag that can be used with > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2) > would then tear down the old vma so subsequent accesses to the vma > cause a segfault. However, with this new flag it will keep the old > vma with zapping PTEs so any access to the old VMA after that point > will result in a pagefault. This needs a vastly better description. Perhaps: When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call. If a userfaultfd was watching the source, it will continue to watch the new mapping. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. Or is it something else? > > This feature will find a use in ChromeOS along with userfaultfd. > Specifically we will want to register a VMA with userfaultfd and then > pull it out from under a running process. By using MREMAP_DONTUNMAP we > don't have to worry about mprotecting and then potentially racing with > VMA permission changes from a running process. Does this mean you yank it out but you want to replace it simultaneously? > > This feature also has a use case in Android, Lokesh Gidra has said > that "As part of using userfaultfd for GC, We'll have to move the physical > pages of the java heap to a separate location. For this purpose mremap > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java > heap, its virtual mapping will be removed as well. Therefore, we'll > require performing mmap immediately after. This is not only time consuming > but also opens a time window where a native thread may call mmap and > reserve the java heap's address range for its own usage. This flag > solves the problem." Cute.
On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote: > > > > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote: > > > > MREMAP_DONTUNMAP is an additional flag that can be used with > > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2) > > would then tear down the old vma so subsequent accesses to the vma > > cause a segfault. However, with this new flag it will keep the old > > vma with zapping PTEs so any access to the old VMA after that point > > will result in a pagefault. > > This needs a vastly better description. Perhaps: > > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, > the source mapping will not be removed. Instead it will be cleared as if a > brand new anonymous, private mapping had been created atomically as part of > the mremap() call. If a userfaultfd was watching the source, it will > continue to watch the new mapping. For a mapping that is shared or not > anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. > > This is the exact behaviour I'm looking for. > Or is it something else? > > > > > This feature will find a use in ChromeOS along with userfaultfd. > > Specifically we will want to register a VMA with userfaultfd and then > > pull it out from under a running process. By using MREMAP_DONTUNMAP we > > don't have to worry about mprotecting and then potentially racing with > > VMA permission changes from a running process. > > Does this mean you yank it out but you want to replace it simultaneously? > > > > > This feature also has a use case in Android, Lokesh Gidra has said > > that "As part of using userfaultfd for GC, We'll have to move the > physical > > pages of the java heap to a separate location. For this purpose mremap > > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java > > heap, its virtual mapping will be removed as well. Therefore, we'll > > require performing mmap immediately after. This is not only time > consuming > > but also opens a time window where a native thread may call mmap and > > reserve the java heap's address range for its own usage. This flag > > solves the problem." > > Cute.
Andy, Thanks, yes, that's a much clearer description of the feature. I'll make sure to update the description with subsequent patches and with later man page updates. Brian On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote: > > > > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote: > > > > MREMAP_DONTUNMAP is an additional flag that can be used with > > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2) > > would then tear down the old vma so subsequent accesses to the vma > > cause a segfault. However, with this new flag it will keep the old > > vma with zapping PTEs so any access to the old VMA after that point > > will result in a pagefault. > > This needs a vastly better description. Perhaps: > > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, > the source mapping will not be removed. Instead it will be cleared as if a > brand new anonymous, private mapping had been created atomically as part of > the mremap() call. If a userfaultfd was watching the source, it will > continue to watch the new mapping. For a mapping that is shared or not > anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. > > Or is it something else? > > > > > This feature will find a use in ChromeOS along with userfaultfd. > > Specifically we will want to register a VMA with userfaultfd and then > > pull it out from under a running process. By using MREMAP_DONTUNMAP we > > don't have to worry about mprotecting and then potentially racing with > > VMA permission changes from a running process. > > Does this mean you yank it out but you want to replace it simultaneously? > > > > > This feature also has a use case in Android, Lokesh Gidra has said > > that "As part of using userfaultfd for GC, We'll have to move the > physical > > pages of the java heap to a separate location. For this purpose mremap > > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java > > heap, its virtual mapping will be removed as well. Therefore, we'll > > require performing mmap immediately after. This is not only time > consuming > > but also opens a time window where a native thread may call mmap and > > reserve the java heap's address range for its own usage. This flag > > solves the problem." > > Cute.
Andy, Thanks, yes, that's a much clearer description of the feature. I'll make sure to update the description with subsequent patches and with later man page updates. Brian On Wed, Jan 22, 2020 at 7:02 PM Andy Lutomirski <luto@amacapital.net> wrote: > > > > > On Jan 22, 2020, at 5:46 PM, Brian Geffon <bgeffon@google.com> wrote: > > > > MREMAP_DONTUNMAP is an additional flag that can be used with > > MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2) > > would then tear down the old vma so subsequent accesses to the vma > > cause a segfault. However, with this new flag it will keep the old > > vma with zapping PTEs so any access to the old VMA after that point > > will result in a pagefault. > > This needs a vastly better description. Perhaps: > > When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. Instead it will be cleared as if a brand new anonymous, private mapping had been created atomically as part of the mremap() call. If a userfaultfd was watching the source, it will continue to watch the new mapping. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. > > Or is it something else? > > > > > This feature will find a use in ChromeOS along with userfaultfd. > > Specifically we will want to register a VMA with userfaultfd and then > > pull it out from under a running process. By using MREMAP_DONTUNMAP we > > don't have to worry about mprotecting and then potentially racing with > > VMA permission changes from a running process. > > Does this mean you yank it out but you want to replace it simultaneously? > > > > > This feature also has a use case in Android, Lokesh Gidra has said > > that "As part of using userfaultfd for GC, We'll have to move the physical > > pages of the java heap to a separate location. For this purpose mremap > > will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java > > heap, its virtual mapping will be removed as well. Therefore, we'll > > require performing mmap immediately after. This is not only time consuming > > but also opens a time window where a native thread may call mmap and > > reserve the java heap's address range for its own usage. This flag > > solves the problem." > > Cute.
Hi Brian, url: https://github.com/0day-ci/linux/commits/Brian-Geffon/mm-Add-MREMAP_DONTUNMAP-to-mremap/20200125-013342 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4703d9119972bf586d2cca76ec6438f819ffa30e If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> smatch warnings: mm/mremap.c:561 mremap_to() error: potentially dereferencing uninitialized 'vma'. # https://github.com/0day-ci/linux/commit/98663ca05501623c3da7f0f30be8ba7d632cf010 git remote add linux-review https://github.com/0day-ci/linux git remote update linux-review git checkout 98663ca05501623c3da7f0f30be8ba7d632cf010 vim +/vma +561 mm/mremap.c 81909b842107ef Michel Lespinasse 2013-02-22 506 static unsigned long mremap_to(unsigned long addr, unsigned long old_len, 72f87654c69690 Pavel Emelyanov 2017-02-22 507 unsigned long new_addr, unsigned long new_len, bool *locked, 98663ca0550162 Brian Geffon 2020-01-22 508 unsigned long flags, struct vm_userfaultfd_ctx *uf, b22823719302e8 Mike Rapoport 2017-08-02 509 struct list_head *uf_unmap_early, 897ab3e0c49e24 Mike Rapoport 2017-02-24 510 struct list_head *uf_unmap) ecc1a8993751de Al Viro 2009-11-24 511 { ecc1a8993751de Al Viro 2009-11-24 512 struct mm_struct *mm = current->mm; ecc1a8993751de Al Viro 2009-11-24 513 struct vm_area_struct *vma; ecc1a8993751de Al Viro 2009-11-24 514 unsigned long ret = -EINVAL; ecc1a8993751de Al Viro 2009-11-24 515 unsigned long charged = 0; 097eed103862f9 Al Viro 2009-11-24 516 unsigned long map_flags; ecc1a8993751de Al Viro 2009-11-24 517 f19cb115a25f3f Alexander Kuleshov 2015-11-05 518 if (offset_in_page(new_addr)) ecc1a8993751de Al Viro 2009-11-24 519 goto out; ecc1a8993751de Al Viro 2009-11-24 520 ecc1a8993751de Al Viro 2009-11-24 521 if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len) ecc1a8993751de Al Viro 2009-11-24 522 goto out; ecc1a8993751de Al Viro 2009-11-24 523 9943242ca46814 Oleg Nesterov 2015-09-04 524 /* Ensure the old/new locations do not overlap */ 9943242ca46814 Oleg Nesterov 2015-09-04 525 if (addr + old_len > new_addr && new_addr + new_len > addr) ecc1a8993751de Al Viro 2009-11-24 526 goto out; ecc1a8993751de Al Viro 2009-11-24 527 ea2c3f6f554561 Oscar Salvador 2019-03-05 528 /* ea2c3f6f554561 Oscar Salvador 2019-03-05 529 * move_vma() need us to stay 4 maps below the threshold, otherwise ea2c3f6f554561 Oscar Salvador 2019-03-05 530 * it will bail out at the very beginning. ea2c3f6f554561 Oscar Salvador 2019-03-05 531 * That is a problem if we have already unmaped the regions here ea2c3f6f554561 Oscar Salvador 2019-03-05 532 * (new_addr, and old_addr), because userspace will not know the ea2c3f6f554561 Oscar Salvador 2019-03-05 533 * state of the vma's after it gets -ENOMEM. ea2c3f6f554561 Oscar Salvador 2019-03-05 534 * So, to avoid such scenario we can pre-compute if the whole ea2c3f6f554561 Oscar Salvador 2019-03-05 535 * operation has high chances to success map-wise. ea2c3f6f554561 Oscar Salvador 2019-03-05 536 * Worst-scenario case is when both vma's (new_addr and old_addr) get ea2c3f6f554561 Oscar Salvador 2019-03-05 537 * split in 3 before unmaping it. ea2c3f6f554561 Oscar Salvador 2019-03-05 538 * That means 2 more maps (1 for each) to the ones we already hold. ea2c3f6f554561 Oscar Salvador 2019-03-05 539 * Check whether current map count plus 2 still leads us to 4 maps below ea2c3f6f554561 Oscar Salvador 2019-03-05 540 * the threshold, otherwise return -ENOMEM here to be more safe. ea2c3f6f554561 Oscar Salvador 2019-03-05 541 */ ea2c3f6f554561 Oscar Salvador 2019-03-05 542 if ((mm->map_count + 2) >= sysctl_max_map_count - 3) ea2c3f6f554561 Oscar Salvador 2019-03-05 543 return -ENOMEM; ea2c3f6f554561 Oscar Salvador 2019-03-05 544 b22823719302e8 Mike Rapoport 2017-08-02 545 ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); ecc1a8993751de Al Viro 2009-11-24 546 if (ret) ecc1a8993751de Al Viro 2009-11-24 547 goto out; ecc1a8993751de Al Viro 2009-11-24 548 ecc1a8993751de Al Viro 2009-11-24 549 if (old_len >= new_len) { 897ab3e0c49e24 Mike Rapoport 2017-02-24 550 ret = do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap); ecc1a8993751de Al Viro 2009-11-24 551 if (ret && old_len != new_len) ecc1a8993751de Al Viro 2009-11-24 552 goto out; ecc1a8993751de Al Viro 2009-11-24 553 old_len = new_len; ecc1a8993751de Al Viro 2009-11-24 554 } ecc1a8993751de Al Viro 2009-11-24 555 98663ca0550162 Brian Geffon 2020-01-22 556 /* 98663ca0550162 Brian Geffon 2020-01-22 557 * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will 98663ca0550162 Brian Geffon 2020-01-22 558 * check that we can expand by old_len and vma_to_resize will handle 98663ca0550162 Brian Geffon 2020-01-22 559 * the vma growing. 98663ca0550162 Brian Geffon 2020-01-22 560 */ 98663ca0550162 Brian Geffon 2020-01-22 @561 if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm, 98663ca0550162 Brian Geffon 2020-01-22 562 vma->vm_flags, old_len >> PAGE_SHIFT))) { ^^^^^^^^^^^^^ 98663ca0550162 Brian Geffon 2020-01-22 563 ret = -ENOMEM; 98663ca0550162 Brian Geffon 2020-01-22 564 goto out; 98663ca0550162 Brian Geffon 2020-01-22 565 } 98663ca0550162 Brian Geffon 2020-01-22 566 ecc1a8993751de Al Viro 2009-11-24 567 vma = vma_to_resize(addr, old_len, new_len, &charged); ^^^^^^^^^^^^^^^^^^^^ ecc1a8993751de Al Viro 2009-11-24 568 if (IS_ERR(vma)) { ecc1a8993751de Al Viro 2009-11-24 569 ret = PTR_ERR(vma); ecc1a8993751de Al Viro 2009-11-24 570 goto out; ecc1a8993751de Al Viro 2009-11-24 571 } ecc1a8993751de Al Viro 2009-11-24 572 097eed103862f9 Al Viro 2009-11-24 573 map_flags = MAP_FIXED; --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index fc1a64c3447b..923cc162609c 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -5,8 +5,9 @@ #include <asm/mman.h> #include <asm-generic/hugetlb_encode.h> -#define MREMAP_MAYMOVE 1 -#define MREMAP_FIXED 2 +#define MREMAP_MAYMOVE 1 +#define MREMAP_FIXED 2 +#define MREMAP_DONTUNMAP 4 #define OVERCOMMIT_GUESS 0 #define OVERCOMMIT_ALWAYS 1 diff --git a/mm/mremap.c b/mm/mremap.c index 122938dcec15..bf97c3eb538b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma, static unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr, - bool *locked, struct vm_userfaultfd_ctx *uf, - struct list_head *uf_unmap) + bool *locked, unsigned long flags, + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) { struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *new_vma; @@ -408,6 +408,13 @@ static unsigned long move_vma(struct vm_area_struct *vma, if (unlikely(vma->vm_flags & VM_PFNMAP)) untrack_pfn_moved(vma); + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) { + if (vm_flags & VM_ACCOUNT) + vma->vm_flags |= VM_ACCOUNT; + + goto out; + } + if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) { /* OOM: unable to split vma, just get accounts right */ vm_unacct_memory(excess >> PAGE_SHIFT); @@ -422,6 +429,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, vma->vm_next->vm_flags |= VM_ACCOUNT; } +out: if (vm_flags & VM_LOCKED) { mm->locked_vm += new_len >> PAGE_SHIFT; *locked = true; @@ -497,7 +505,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, static unsigned long mremap_to(unsigned long addr, unsigned long old_len, unsigned long new_addr, unsigned long new_len, bool *locked, - struct vm_userfaultfd_ctx *uf, + unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { @@ -545,6 +553,17 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, old_len = new_len; } + /* + * MREMAP_DONTUNMAP expands by old_len + (new_len - old_len), we will + * check that we can expand by old_len and vma_to_resize will handle + * the vma growing. + */ + if (unlikely(flags & MREMAP_DONTUNMAP && !may_expand_vm(mm, + vma->vm_flags, old_len >> PAGE_SHIFT))) { + ret = -ENOMEM; + goto out; + } + vma = vma_to_resize(addr, old_len, new_len, &charged); if (IS_ERR(vma)) { ret = PTR_ERR(vma); @@ -561,7 +580,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (IS_ERR_VALUE(ret)) goto out1; - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf, + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap); if (!(offset_in_page(ret))) goto out; @@ -609,12 +628,15 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, addr = untagged_addr(addr); new_addr = untagged_addr(new_addr); - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) return ret; + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_FIXED)) + return ret; + if (offset_in_page(addr)) return ret; @@ -634,7 +656,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, if (flags & MREMAP_FIXED) { ret = mremap_to(addr, old_len, new_addr, new_len, - &locked, &uf, &uf_unmap_early, &uf_unmap); + &locked, flags, &uf, &uf_unmap_early, + &uf_unmap); goto out; } @@ -712,7 +735,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, } ret = move_vma(vma, addr, old_len, new_len, new_addr, - &locked, &uf, &uf_unmap); + &locked, flags, &uf, &uf_unmap); } out: if (offset_in_page(ret)) {
MREMAP_DONTUNMAP is an additional flag that can be used with MREMAP_FIXED to move a mapping to a new address. Normally, mremap(2) would then tear down the old vma so subsequent accesses to the vma cause a segfault. However, with this new flag it will keep the old vma with zapping PTEs so any access to the old VMA after that point will result in a pagefault. This feature will find a use in ChromeOS along with userfaultfd. Specifically we will want to register a VMA with userfaultfd and then pull it out from under a running process. By using MREMAP_DONTUNMAP we don't have to worry about mprotecting and then potentially racing with VMA permission changes from a running process. This feature also has a use case in Android, Lokesh Gidra has said that "As part of using userfaultfd for GC, We'll have to move the physical pages of the java heap to a separate location. For this purpose mremap will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java heap, its virtual mapping will be removed as well. Therefore, we'll require performing mmap immediately after. This is not only time consuming but also opens a time window where a native thread may call mmap and reserve the java heap's address range for its own usage. This flag solves the problem." Signed-off-by: Brian Geffon <bgeffon@google.com> --- include/uapi/linux/mman.h | 5 +++-- mm/mremap.c | 37 ++++++++++++++++++++++++++++++------- 2 files changed, 33 insertions(+), 9 deletions(-)