From patchwork Fri Feb 14 04:09:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Brian Geffon X-Patchwork-Id: 11381653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D74D413A4 for ; Fri, 14 Feb 2020 04:10:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 84AB222314 for ; Fri, 14 Feb 2020 04:10:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="S9Pb+gFT" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 84AB222314 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C30006B05E6; Thu, 13 Feb 2020 23:10:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BB9C66B05E7; Thu, 13 Feb 2020 23:10:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5A406B05E8; Thu, 13 Feb 2020 23:10:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id 8A9966B05E6 for ; Thu, 13 Feb 2020 23:10:28 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0F1FF4DD6 for ; Fri, 14 Feb 2020 04:10:28 +0000 (UTC) X-FDA: 76487405736.25.metal09_3e321aaa33a48 X-Spam-Summary: 2,0,0,a7047be59a206e9b,d41d8cd98f00b204,3sh1gxgckcp0gljkktslttlqj.htrqnsz2-rrp0fhp.twl@flex--bgeffon.bounces.google.com,:akpm@linux-foundation.org:mst@redhat.com:bgeffon@google.com:arnd@arndb.de:linux-kernel@vger.kernel.org::linux-api@vger.kernel.org:luto@amacapital.net:will@kernel.org:aarcange@redhat.com:sonnyrao@google.com:minchan@kernel.org:joel@joelfernandes.org:yuzhao@google.com:jsbarnes@google.com:natechancellor@gmail.com:fweimer@redhat.com:kirill@shutemov.name,RULES_HIT:1:2:41:152:355:379:541:800:960:968:973:988:989:1260:1277:1311:1313:1314:1345:1359:1431:1437:1513:1515:1516:1518:1521:1593:1594:1605:1730:1747:1777:1792:1981:2194:2199:2393:2553:2559:2562:2693:2890:2898:3138:3139:3140:3141:3142:3152:3865:3866:3867:3868:3870:3871:3872:3873:3874:4052:4250:4321:4605:5007:6261:6653:6742:7875:7903:8603:8660:9036:9038:9592:9969:10004:11026:11473:11658:11783:11914:12043:12291:12294:12296:12297:12438:12555:12679:12683:12895:13148:13161:13229:13230:14096:14097:143 94:14659 X-HE-Tag: metal09_3e321aaa33a48 X-Filterd-Recvd-Size: 12904 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Fri, 14 Feb 2020 04:10:27 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id u10so4980503pjy.2 for ; Thu, 13 Feb 2020 20:10:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=RgKvkAoRvGq4vq4km26PiUmWq0ZHQ4oZSYyicW56YH0=; b=S9Pb+gFTHb2Fh0ZY45QqPil7oBPqPgiU0Zolx6RwAtrcPG80CXV7dPFbjnjxymlTuZ 25Cprb4N9pizPEtP6vyfatugmqEqTv16F4NQWSMPwSpLlWlQ3X8x7OuYorZD7KGIwDTf 3tnAVAsHwe1lEHmKueV8PkPZvMrb/Qxzv65ALImqrVEGr5LcaEzQBMRkbEKPD6yLV6XG ARP0QwOumatlrJC16FGfkFDMT/Ivf2Top8vxDfYVQmGJdtiuehpLHDpwrzEymndFj0r9 D2WctuybgBlHU7FxmZtS9OTYY9BHP2FPe7C2CWTObTLai1HvKAu3w/1rCCCZ2cY2QNm4 CGhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=RgKvkAoRvGq4vq4km26PiUmWq0ZHQ4oZSYyicW56YH0=; b=PKkr4vcowafBlmPbwJMpK3w9Sr3htwlq65Eb5hp1cxyMqgHpPS3pS3mZezw74L8OLN Pouh/ZMR5W2xuQ1+kKs0jgVVW3RV1zl3jGGHII8p8ihpXec9n35XnRKYBl5gdI+Ph8DZ lCH95OeBdecUrugaNwU5a5al6Sp1T41X8qz6nVbeECyXN0xUfAOFTWnnzLOJIJIC0wSz hGALRJNAkNyJvu2m+uq9QkZhzUzu5lhEWbhJSXGV4PASgvI5wgpU0jKKXbdwM8El7gYk g/tXKti4HGvhZKOmki/N2u47y8EFCwNq6JayZOhhSInM53diBUDomkZmGCyW1cUcR3yC maXA== X-Gm-Message-State: APjAAAUaGZEQSsk++lvBfxaFLqdMQp1dyS9OwEUoPiMeV5U0FxvbHgmx cRjojA0B3qtT9vKMpq17ke/qER4Py+m7 X-Google-Smtp-Source: APXvYqwSQlIs6i0PxI4EUv3EQMVq8ylCsVYE+Eg6DEB4uXJc5lejL87CH4udKfX3jzrC9nUtuVKNftDDcyGg X-Received: by 2002:a63:3f85:: with SMTP id m127mr1361750pga.15.1581653426019; Thu, 13 Feb 2020 20:10:26 -0800 (PST) Date: Thu, 13 Feb 2020 20:09:51 -0800 In-Reply-To: <20200207201856.46070-1-bgeffon@google.com> Message-Id: <20200214040952.43195-1-bgeffon@google.com> Mime-Version: 1.0 References: <20200207201856.46070-1-bgeffon@google.com> X-Mailer: git-send-email 2.25.0.265.gbab2e86ba0-goog Subject: [PATCH v5 1/2] mm: Add MREMAP_DONTUNMAP to mremap(). From: Brian Geffon To: Andrew Morton Cc: "Michael S . Tsirkin" , Brian Geffon , Arnd Bergmann , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Andy Lutomirski , Will Deacon , Andrea Arcangeli , Sonny Rao , Minchan Kim , Joel Fernandes , Yu Zhao , Jesse Barnes , Nathan Chancellor , Florian Weimer , "Kirill A . Shutemov" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set, the source mapping will not be removed. The remap operation will be performed as it would have been normally by moving over the page tables to the new mapping. The old vma will have any locked flags cleared, have no pagetables, and any userfaultfds that were watching that range will continue watching it. For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause the mremap() call to fail. Because MREMAP_DONTUNMAP always results in moving a VMA you MUST use the MREMAP_MAYMOVE flag. The final result is two equally sized VMAs where the destination contains the PTEs of the source. We hope to use this in Chrome OS where with userfaultfd we could write an anonymous mapping to disk without having to STOP the process or worry about VMA permission changes. This feature also has a use case in Android, Lokesh Gidra has said that "As part of using userfaultfd for GC, We'll have to move the physical pages of the java heap to a separate location. For this purpose mremap will be used. Without the MREMAP_DONTUNMAP flag, when I mremap the java heap, its virtual mapping will be removed as well. Therefore, we'll require performing mmap immediately after. This is not only time consuming but also opens a time window where a native thread may call mmap and reserve the java heap's address range for its own usage. This flag solves the problem." v4 -> v5: - Correct commit message to more accurately reflect the behavior. - Clear VM_LOCKED and VM_LOCKEDONFAULT on the old vma.     Signed-off-by: Brian Geffon --- include/uapi/linux/mman.h | 5 +- mm/mremap.c | 106 ++++++++++++++++++++++++++++++-------- 2 files changed, 88 insertions(+), 23 deletions(-) diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index fc1a64c3447b..923cc162609c 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -5,8 +5,9 @@ #include #include -#define MREMAP_MAYMOVE 1 -#define MREMAP_FIXED 2 +#define MREMAP_MAYMOVE 1 +#define MREMAP_FIXED 2 +#define MREMAP_DONTUNMAP 4 #define OVERCOMMIT_GUESS 0 #define OVERCOMMIT_ALWAYS 1 diff --git a/mm/mremap.c b/mm/mremap.c index 1fc8a29fbe3f..a2a792fdbc64 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -318,8 +318,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma, static unsigned long move_vma(struct vm_area_struct *vma, unsigned long old_addr, unsigned long old_len, unsigned long new_len, unsigned long new_addr, - bool *locked, struct vm_userfaultfd_ctx *uf, - struct list_head *uf_unmap) + bool *locked, unsigned long flags, + struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap) { struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *new_vma; @@ -408,11 +408,49 @@ static unsigned long move_vma(struct vm_area_struct *vma, if (unlikely(vma->vm_flags & VM_PFNMAP)) untrack_pfn_moved(vma); + if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) { + if (vm_flags & VM_ACCOUNT) { + /* Always put back VM_ACCOUNT since we won't unmap */ + vma->vm_flags |= VM_ACCOUNT; + + vm_acct_memory(vma_pages(new_vma)); + } + + /* + * locked_vm accounting: if the mapping remained the same size + * it will have just moved and we don't need to touch locked_vm + * because we skip the do_unmap. If the mapping shrunk before + * being moved then the do_unmap on that portion will have + * adjusted vm_locked. Only if the mapping grows do we need to + * do something special; the reason is locked_vm only accounts + * for old_len, but we're now adding new_len - old_len locked + * bytes to the new mapping. + */ + if (vm_flags & VM_LOCKED) { + /* We always clear VM_LOCKED[ONFAULT] on the old vma */ + vma->vm_flags &= VM_LOCKED_CLEAR_MASK; + + if (new_len > old_len) { + mm->locked_vm += + (new_len - old_len) >> PAGE_SHIFT; + *locked = true; + } + } + + goto out; + } + if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) { /* OOM: unable to split vma, just get accounts right */ vm_unacct_memory(excess >> PAGE_SHIFT); excess = 0; } + + if (vm_flags & VM_LOCKED) { + mm->locked_vm += new_len >> PAGE_SHIFT; + *locked = true; + } +out: mm->hiwater_vm = hiwater_vm; /* Restore VM_ACCOUNT if one or two pieces of vma left */ @@ -422,16 +460,12 @@ static unsigned long move_vma(struct vm_area_struct *vma, vma->vm_next->vm_flags |= VM_ACCOUNT; } - if (vm_flags & VM_LOCKED) { - mm->locked_vm += new_len >> PAGE_SHIFT; - *locked = true; - } - return new_addr; } static struct vm_area_struct *vma_to_resize(unsigned long addr, - unsigned long old_len, unsigned long new_len, unsigned long *p) + unsigned long old_len, unsigned long new_len, unsigned long flags, + unsigned long *p) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = find_vma(mm, addr); @@ -453,6 +487,10 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, return ERR_PTR(-EINVAL); } + if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) || + vma->vm_flags & VM_SHARED)) + return ERR_PTR(-EINVAL); + if (is_vm_hugetlb_page(vma)) return ERR_PTR(-EINVAL); @@ -497,7 +535,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, static unsigned long mremap_to(unsigned long addr, unsigned long old_len, unsigned long new_addr, unsigned long new_len, bool *locked, - struct vm_userfaultfd_ctx *uf, + unsigned long flags, struct vm_userfaultfd_ctx *uf, struct list_head *uf_unmap_early, struct list_head *uf_unmap) { @@ -505,7 +543,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, struct vm_area_struct *vma; unsigned long ret = -EINVAL; unsigned long charged = 0; - unsigned long map_flags; + unsigned long map_flags = 0; if (offset_in_page(new_addr)) goto out; @@ -534,9 +572,11 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if ((mm->map_count + 2) >= sysctl_max_map_count - 3) return -ENOMEM; - ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); - if (ret) - goto out; + if (flags & MREMAP_FIXED) { + ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); + if (ret) + goto out; + } if (old_len >= new_len) { ret = do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap); @@ -545,13 +585,26 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, old_len = new_len; } - vma = vma_to_resize(addr, old_len, new_len, &charged); + vma = vma_to_resize(addr, old_len, new_len, flags, &charged); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto out; } - map_flags = MAP_FIXED; + /* + * MREMAP_DONTUNMAP expands by new_len - (new_len - old_len), we will + * check that we can expand by new_len and vma_to_resize will handle + * the vma growing which is (new_len - old_len). + */ + if (flags & MREMAP_DONTUNMAP && + !may_expand_vm(mm, vma->vm_flags, new_len >> PAGE_SHIFT)) { + ret = -ENOMEM; + goto out; + } + + if (flags & MREMAP_FIXED) + map_flags |= MAP_FIXED; + if (vma->vm_flags & VM_MAYSHARE) map_flags |= MAP_SHARED; @@ -561,10 +614,16 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if (offset_in_page(ret)) goto out1; - ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf, + /* We got a new mapping */ + if (!(flags & MREMAP_FIXED)) + new_addr = ret; + + ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, flags, uf, uf_unmap); + if (!(offset_in_page(ret))) goto out; + out1: vm_unacct_memory(charged); @@ -609,12 +668,16 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, addr = untagged_addr(addr); new_addr = untagged_addr(new_addr); - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP)) return ret; if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE)) return ret; + /* MREMAP_DONTUNMAP is always a move */ + if (flags & MREMAP_DONTUNMAP && !(flags & MREMAP_MAYMOVE)) + return ret; + if (offset_in_page(addr)) return ret; @@ -632,9 +695,10 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, if (down_write_killable(¤t->mm->mmap_sem)) return -EINTR; - if (flags & MREMAP_FIXED) { + if (flags & MREMAP_FIXED || flags & MREMAP_DONTUNMAP) { ret = mremap_to(addr, old_len, new_addr, new_len, - &locked, &uf, &uf_unmap_early, &uf_unmap); + &locked, flags, &uf, &uf_unmap_early, + &uf_unmap); goto out; } @@ -662,7 +726,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, /* * Ok, we need to grow.. */ - vma = vma_to_resize(addr, old_len, new_len, &charged); + vma = vma_to_resize(addr, old_len, new_len, flags, &charged); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto out; @@ -712,7 +776,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, } ret = move_vma(vma, addr, old_len, new_len, new_addr, - &locked, &uf, &uf_unmap); + &locked, flags, &uf, &uf_unmap); } out: if (offset_in_page(ret)) { From patchwork Fri Feb 14 04:09:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Geffon X-Patchwork-Id: 11381655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B63E7138D for ; Fri, 14 Feb 2020 04:10:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 68659217F4 for ; Fri, 14 Feb 2020 04:10:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LQPVFDje" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68659217F4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A34CD6B05E7; Thu, 13 Feb 2020 23:10:37 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9E4696B05E8; Thu, 13 Feb 2020 23:10:37 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AD366B05E9; Thu, 13 Feb 2020 23:10:37 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92]) by kanga.kvack.org (Postfix) with ESMTP id 6E5286B05E7 for ; Thu, 13 Feb 2020 23:10:37 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0E3D0824805A for ; Fri, 14 Feb 2020 04:10:37 +0000 (UTC) X-FDA: 76487406114.29.boys61_3f86ed693bd31 X-Spam-Summary: 30,2,0,2d4914c107845ddb,d41d8cd98f00b204,3ux1gxgckcagjomnnwvowwotm.kwutqv25-uus3iks.wzo@flex--bgeffon.bounces.google.com,:akpm@linux-foundation.org:mst@redhat.com:bgeffon@google.com:arnd@arndb.de:linux-kernel@vger.kernel.org::linux-api@vger.kernel.org:luto@amacapital.net:will@kernel.org:aarcange@redhat.com:sonnyrao@google.com:minchan@kernel.org:joel@joelfernandes.org:yuzhao@google.com:jsbarnes@google.com:natechancellor@gmail.com:fweimer@redhat.com:kirill@shutemov.name,RULES_HIT:4:41:69:152:355:379:541:800:960:968:973:982:988:989:1260:1277:1313:1314:1345:1359:1437:1516:1518:1593:1594:1605:1730:1747:1777:1792:1801:2198:2199:2393:2538:2559:2562:2640:2689:2693:2987:3138:3139:3140:3141:3142:3152:3653:3865:3866:3867:3868:3870:3872:3874:4321:4605:5007:6119:6261:6653:6742:7619:7903:8603:9969:10004:11026:11233:11473:11657:11658:11914:12043:12296:12297:12438:12555:12683:12895:12986:13161:13229:14394:14659:21080:21444:21451:21505:21627:21796:21990:30036:30054:30056:30067 :30069:3 X-HE-Tag: boys61_3f86ed693bd31 X-Filterd-Recvd-Size: 18000 Received: from mail-ua1-f74.google.com (mail-ua1-f74.google.com [209.85.222.74]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Fri, 14 Feb 2020 04:10:36 +0000 (UTC) Received: by mail-ua1-f74.google.com with SMTP id u11so691679ual.5 for ; Thu, 13 Feb 2020 20:10:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=2uoNUUGtHwnITLt2DMb1JQQT22YDWW1bocfB8RTH9Fw=; b=LQPVFDje4iIkzs/Lq0cNUS+xTINGSTgVULCElCh14tLPD2INxCJgjei1vspVBaASzc SNvStdSwbZHxBn9lokVpawwO/wQYX96pM0XptK3rG94rH8GSzWDQETZ5vx6BxXjFOI94 mF6mjr/L1Hop7uop0gby+4xGRz0rXQGcFmb1T6sELlHiQQRrZvO7osvNUuCtj3oBOz9A Jslh6nUYoKu5+/57XX62hH5IYioZBtV0Ds/zIl5qaTMp9CII8H7XfKf3KWQyC1HWgI33 0x5LGgag6SOYQOCN8MCorkurk+8dmPR4pewBKH2QD9hxq74tVaJVqY7KipQc5zM+lw3a xX+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=2uoNUUGtHwnITLt2DMb1JQQT22YDWW1bocfB8RTH9Fw=; b=bbIBbDfoFx82VkWiuFAMrb30Ip+ppcznAPa0HebBDb4qdiv0322U2qvqJedf+EJ2fW +cAgBtQ8WphKm8hNUv0C6a60JsbfAiQsCYWVfO/4TVWEuBcC3edx+/eJ9vpWVH9hDKov B6hna4g/NrPED57TPx1HFM0GuYQ0fs80huSzL0TtB5NQp6TIstl+Ne631ZP06vc6QvH3 2mbdSgx71mXOCyDDxinP2t+OZkq8lRr1TNR5UTj1sGeyDQ/6roew80jRfxKSFqMqWjg1 l7e5YNFAryOtuDkv0tlDpT7Bv7ovSh58fugorUCtKI7023u5Oh+h2yHz0GWcM9fUpKox yvNA== X-Gm-Message-State: APjAAAXVCwENEbATqBWUwbQsvHJHxpwUzchTuCuIngfiySGs2dw1wuAS 4Rd1vP67efgJjDhewaHxAoB4aKHhYRVs X-Google-Smtp-Source: APXvYqwsH7G0kvl5P2gwLIJaneCn1/p4BLpgvi86EadxVGGeuN+Um7Lvo0ziXTBjXF+U0dp2mijCA+5q7ug2 X-Received: by 2002:ab0:248a:: with SMTP id i10mr590373uan.108.1581653435831; Thu, 13 Feb 2020 20:10:35 -0800 (PST) Date: Thu, 13 Feb 2020 20:09:52 -0800 In-Reply-To: <20200214040952.43195-1-bgeffon@google.com> Message-Id: <20200214040952.43195-2-bgeffon@google.com> Mime-Version: 1.0 References: <20200207201856.46070-1-bgeffon@google.com> <20200214040952.43195-1-bgeffon@google.com> X-Mailer: git-send-email 2.25.0.265.gbab2e86ba0-goog Subject: [PATCH v5 2/2] selftest: Add MREMAP_DONTUNMAP selftest. From: Brian Geffon To: Andrew Morton Cc: "Michael S . Tsirkin" , Brian Geffon , Arnd Bergmann , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Andy Lutomirski , Will Deacon , Andrea Arcangeli , Sonny Rao , Minchan Kim , Joel Fernandes , Yu Zhao , Jesse Barnes , Nathan Chancellor , Florian Weimer , "Kirill A . Shutemov" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a few simple self tests for the new flag MREMAP_DONTUNMAP, they are simple smoke tests which also demonstrate the behavior. Signed-off-by: Brian Geffon --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/mremap_dontunmap.c | 326 ++++++++++++++++++ tools/testing/selftests/vm/run_vmtests | 15 + 3 files changed, 342 insertions(+) create mode 100644 tools/testing/selftests/vm/mremap_dontunmap.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 9534dc2bc929..4b2b969fc3c7 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -12,6 +12,7 @@ TEST_GEN_FILES += map_fixed_noreplace TEST_GEN_FILES += map_populate TEST_GEN_FILES += mlock-random-test TEST_GEN_FILES += mlock2-tests +TEST_GEN_FILES += mremap_dontunmap TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress diff --git a/tools/testing/selftests/vm/mremap_dontunmap.c b/tools/testing/selftests/vm/mremap_dontunmap.c new file mode 100644 index 000000000000..de2a861c7c6d --- /dev/null +++ b/tools/testing/selftests/vm/mremap_dontunmap.c @@ -0,0 +1,326 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Tests for mremap w/ MREMAP_DONTUNMAP. + * + * Copyright 2020, Brian Geffon + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#ifndef MREMAP_DONTUNMAP +#define MREMAP_DONTUNMAP 4 +#endif + +unsigned long page_size; +char *page_buffer; + +static void dump_maps(void) +{ + char cmd[32]; + + snprintf(cmd, sizeof(cmd), "cat /proc/%d/maps", getpid()); + system(cmd); +} + +#define BUG_ON(condition, description) \ + do { \ + if (condition) { \ + fprintf(stderr, "[FAIL]\t%s():%d\t%s:%s\n", __func__, \ + __LINE__, (description), strerror(errno)); \ + dump_maps(); \ + exit(1); \ + } \ + } while (0) + +// Try a simple operation for to "test" for kernel support this prevents +// reporting tests as failed when it's run on an older kernel. +static int kernel_support_for_mremap_dontunmap() +{ + int ret = 0; + unsigned long num_pages = 1; + void *source_mapping = mmap(NULL, num_pages * page_size, PROT_NONE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + BUG_ON(source_mapping == MAP_FAILED, "mmap"); + + // This simple remap should only fail if MREMAP_DONTUNMAP isn't + // supported. + void *dest_mapping = + mremap(source_mapping, num_pages * page_size, num_pages * page_size, + MREMAP_DONTUNMAP | MREMAP_MAYMOVE, 0); + if (dest_mapping == MAP_FAILED) { + ret = errno; + } else { + BUG_ON(munmap(dest_mapping, num_pages * page_size) == -1, + "unable to unmap destination mapping"); + } + + BUG_ON(munmap(source_mapping, num_pages * page_size) == -1, + "unable to unmap source mapping"); + return ret; +} + +// This helper will just validate that an entire mapping contains the expected +// byte. +static int check_region_contains_byte(void *addr, unsigned long size, char byte) +{ + BUG_ON(size & (page_size - 1), + "check_region_contains_byte expects page multiples"); + BUG_ON((unsigned long)addr & (page_size - 1), + "check_region_contains_byte expects page alignment"); + + memset(page_buffer, byte, page_size); + + unsigned long num_pages = size / page_size; + unsigned long i; + + // Compare each page checking that it contains our expected byte. + for (i = 0; i < num_pages; ++i) { + int ret = + memcmp(addr + (i * page_size), page_buffer, page_size); + if (ret) { + return ret; + } + } + + return 0; +} + +// this test validates that MREMAP_DONTUNMAP moves the pagetables while leaving +// the source mapping mapped. +static void mremap_dontunmap_simple() +{ + unsigned long num_pages = 5; + + void *source_mapping = + mmap(NULL, num_pages * page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + BUG_ON(source_mapping == MAP_FAILED, "mmap"); + + memset(source_mapping, 'a', num_pages * page_size); + + // Try to just move the whole mapping anywhere (not fixed). + void *dest_mapping = + mremap(source_mapping, num_pages * page_size, num_pages * page_size, + MREMAP_DONTUNMAP | MREMAP_MAYMOVE, NULL); + BUG_ON(dest_mapping == MAP_FAILED, "mremap"); + + // Validate that the pages have been moved, we know they were moved if + // the dest_mapping contains a's. + BUG_ON(check_region_contains_byte + (dest_mapping, num_pages * page_size, 'a') != 0, + "pages did not migrate"); + BUG_ON(check_region_contains_byte + (source_mapping, num_pages * page_size, 0) != 0, + "source should have no ptes"); + + BUG_ON(munmap(dest_mapping, num_pages * page_size) == -1, + "unable to unmap destination mapping"); + BUG_ON(munmap(source_mapping, num_pages * page_size) == -1, + "unable to unmap source mapping"); +} + +// This test validates MREMAP_DONTUNMAP will move page tables to a specific +// destination using MREMAP_FIXED, also while validating that the source +// remains intact. +static void mremap_dontunmap_simple_fixed() +{ + unsigned long num_pages = 5; + + // Since we want to guarantee that we can remap to a point, we will + // create a mapping up front. + void *dest_mapping = + mmap(NULL, num_pages * page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + BUG_ON(dest_mapping == MAP_FAILED, "mmap"); + memset(dest_mapping, 'X', num_pages * page_size); + + void *source_mapping = + mmap(NULL, num_pages * page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + BUG_ON(source_mapping == MAP_FAILED, "mmap"); + memset(source_mapping, 'a', num_pages * page_size); + + void *remapped_mapping = + mremap(source_mapping, num_pages * page_size, num_pages * page_size, + MREMAP_FIXED | MREMAP_DONTUNMAP | MREMAP_MAYMOVE, + dest_mapping); + BUG_ON(remapped_mapping == MAP_FAILED, "mremap"); + BUG_ON(remapped_mapping != dest_mapping, + "mremap should have placed the remapped mapping at dest_mapping"); + + // The dest mapping will have been unmap by mremap so we expect the Xs + // to be gone and replaced with a's. + BUG_ON(check_region_contains_byte + (dest_mapping, num_pages * page_size, 'a') != 0, + "pages did not migrate"); + + // And the source mapping will have had its ptes dropped. + BUG_ON(check_region_contains_byte + (source_mapping, num_pages * page_size, 0) != 0, + "source should have no ptes"); + + BUG_ON(munmap(dest_mapping, num_pages * page_size) == -1, + "unable to unmap destination mapping"); + BUG_ON(munmap(source_mapping, num_pages * page_size) == -1, + "unable to unmap source mapping"); +} + +// This test validates that we can MREMAP_DONTUNMAP for a portion of an +// existing mapping. +static void mremap_dontunmap_partial_mapping() +{ + /* + * source mapping: + * -------------- + * | aaaaaaaaaa | + * -------------- + * to become: + * -------------- + * | aaaaa00000 | + * -------------- + * With the destination mapping containing 5 pages of As. + * --------- + * | aaaaa | + * --------- + */ + unsigned long num_pages = 10; + void *source_mapping = + mmap(NULL, num_pages * page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + BUG_ON(source_mapping == MAP_FAILED, "mmap"); + memset(source_mapping, 'a', num_pages * page_size); + + // We will grab the last 5 pages of the source and move them. + void *dest_mapping = + mremap(source_mapping + (5 * page_size), 5 * page_size, + 5 * page_size, + MREMAP_DONTUNMAP | MREMAP_MAYMOVE, NULL); + BUG_ON(dest_mapping == MAP_FAILED, "mremap"); + + // We expect the first 5 pages of the source to contain a's and the + // final 5 pages to contain zeros. + BUG_ON(check_region_contains_byte(source_mapping, 5 * page_size, 'a') != + 0, "first 5 pages of source should have original pages"); + BUG_ON(check_region_contains_byte + (source_mapping + (5 * page_size), 5 * page_size, 0) != 0, + "final 5 pages of source should have no ptes"); + + // Finally we expect the destination to have 5 pages worth of a's. + BUG_ON(check_region_contains_byte(dest_mapping, 5 * page_size, 'a') != + 0, "dest mapping should contain ptes from the source"); + + BUG_ON(munmap(dest_mapping, 5 * page_size) == -1, + "unable to unmap destination mapping"); + BUG_ON(munmap(source_mapping, num_pages * page_size) == -1, + "unable to unmap source mapping"); +} + +// This test validates that we can shrink an existing mapping via the normal +// mremap behavior along with the MREMAP_DONTUNMAP flag. +static void mremap_dontunmap_shrink_mapping() +{ + /* + * We shrink the source by 5 pages while remapping. + * source mapping: + * -------------- + * | aaaaaaaaaa | + * -------------- + * to become: + * --------- + * | 00000 | + * --------- + * With the destination mapping containing 5 pages of As followed by + * the original pages of Xs. + * -------------- + * | aaaaaXXXXX | + * -------------- + */ + + unsigned long num_pages = 10; + + // We use MREMAP_FIXED because we don't want the mremap to place the + // remapped mapping behind the source, if it did + // we wouldn't be able to validate that the mapping was in fact + // adjusted. + void *dest_mapping = + mmap(NULL, num_pages * page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + BUG_ON(dest_mapping == MAP_FAILED, "mmap"); + memset(dest_mapping, 'X', num_pages * page_size); + + void *source_mapping = + mmap(NULL, num_pages * page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + BUG_ON(source_mapping == MAP_FAILED, "mmap"); + memset(source_mapping, 'a', num_pages * page_size); + + // We are shrinking the mapping while also using MREMAP_DONTUNMAP + void *remapped_mapping = + mremap(source_mapping, num_pages * page_size, 5 * page_size, + MREMAP_FIXED | MREMAP_DONTUNMAP | MREMAP_MAYMOVE, + dest_mapping); + BUG_ON(remapped_mapping == MAP_FAILED, "mremap"); + BUG_ON(remapped_mapping != dest_mapping, + "expected mremap to place mapping at dest"); + + // The last 5 pages of source should have become unmapped while the + // first 5 remain. + unsigned char buf[5]; + int ret = mincore(source_mapping + (5 * page_size), 5 * page_size, buf); + BUG_ON((ret != -1 || (ret == -1 && errno != ENOMEM)), + "we expect -ENOMEM from mincore."); + + BUG_ON(check_region_contains_byte(source_mapping, 5 * page_size, 0) != + 0, "source should have no ptes"); + BUG_ON(check_region_contains_byte(dest_mapping, 5 * page_size, 'a') != + 0, "dest mapping should contain ptes from the source"); + + // And the second half of the destination should be unchanged. + BUG_ON(check_region_contains_byte(dest_mapping + (5 * page_size), + 5 * page_size, 'X') != 0, + "second half of dest shouldn't be touched"); + + // Cleanup + BUG_ON(munmap(dest_mapping, num_pages * page_size) == -1, + "unable to unmap destination mapping"); + BUG_ON(munmap(source_mapping, 5 * page_size) == -1, + "unable to unmap source mapping"); +} + +int main(void) +{ + page_size = sysconf(_SC_PAGE_SIZE); + + // test for kernel support for MREMAP_DONTUNMAP skipping the test if + // not. + if (kernel_support_for_mremap_dontunmap() != 0) { + printf("No kernel support for MREMAP_DONTUNMAP\n"); + return KSFT_SKIP; + } + + // Keep a page sized buffer around for when we need it. + page_buffer = + mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + BUG_ON(page_buffer == MAP_FAILED, "unable to mmap a page."); + + mremap_dontunmap_simple(); + mremap_dontunmap_simple_fixed(); + mremap_dontunmap_partial_mapping(); + mremap_dontunmap_shrink_mapping(); + + BUG_ON(munmap(page_buffer, page_size) == -1, + "unable to unmap page buffer"); + + printf("OK\n"); + return 0; +} diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index 951c507a27f7..d380b95c5de5 100755 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -227,4 +227,19 @@ else exitcode=1 fi +echo "------------------------------------" +echo "running MREMAP_DONTUNMAP smoke test" +echo "------------------------------------" +./mremap_dontunmap +ret_val=$? + +if [ $ret_val -eq 0 ]; then + echo "[PASS]" +elif [ $ret_val -eq $ksft_skip ]; then + echo "[SKIP]" + exitcode=$ksft_skip +else + echo "[FAIL]" + exitcode=1 +fi exit $exitcode