From patchwork Fri May 21 07:44:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12272315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-25.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,MISSING_HEADERS, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83EF2C433ED for ; Fri, 21 May 2021 07:44:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 180D3613CA for ; Fri, 21 May 2021 07:44:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 180D3613CA Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 70DEC94001B; Fri, 21 May 2021 03:44:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E52A94000D; Fri, 21 May 2021 03:44:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55FC494001B; Fri, 21 May 2021 03:44:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id 207AD94000D for ; Fri, 21 May 2021 03:44:39 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A2DE7B9E0 for ; Fri, 21 May 2021 07:44:38 +0000 (UTC) X-FDA: 78164451036.36.5CEA4C6 Received: from mail-qk1-f202.google.com (mail-qk1-f202.google.com [209.85.222.202]) by imf29.hostedemail.com (Postfix) with ESMTP id E5D66FB for ; Fri, 21 May 2021 07:44:34 +0000 (UTC) Received: by mail-qk1-f202.google.com with SMTP id e8-20020a05620a2088b02903a5edeec4d6so5145816qka.11 for ; Fri, 21 May 2021 00:44:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:cc; bh=lP5la/LgHRfJCa2dtRcYDqy9eGV1F3X/UaEucDOtQFc=; b=GMrWWk4RTD7u+aW+ekEeE7FTq8wO6w2eZuJgfpYowaLRhNXYD5K+vL97W0fa8N3R7I +iIN0h2gFEf0TirGW4HKTX+ONmTDIJ0SBj/mU+92EnFWj4LrGGyMAHduvxXoq0pbsQsP t+foKTCgq8qS+NDKfT0snnxTR4LPvzSUVSvnkzgKLNxaAW6XMyYZ3aDq8k7/WldkxBVG 7o76EL+cItj8NyglhrZCAixB9jIk8dB4tVLyneq/NX1sqVKpQRooiUwOcOa26KSGoVqw 377633VFQHHs2rZgS8oKACiFUMtWaf2JXkiu0AVwwTwsZZoOK+XrRR5kT5cFOsBhokWO y49w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:cc; bh=lP5la/LgHRfJCa2dtRcYDqy9eGV1F3X/UaEucDOtQFc=; b=Y8qmmki08k1geA2OQqxlV8kutLR9ZyMHIGzbW5os+RNLc/Cel2r5NTg3d9RbgQWPdV QOGiHkK8G8Y1vXzGArH1mJMXMYB31FZo8/CQXGTTW8gUI9wzQ8ubQOkDLuXxpMBwH7bE 2q23GZwkExcJxs36XQCtGZIZd5gPC1S9dZP9ZC3zmSmz0+2sgbONgKKz+uEtrdsdY7XK 3iAFmJDysgvl1trBmrD0+Pk9FfXrT24B93Y237liAouG9QO5Y5er/9c7ejTsUzbUxbnK HQjXijy/uHmuMfk+rsUroc8W5d9YoVVj/aiN4fFSwxztNC/SRXDj/lEwCozTAfIKKeNX ipDw== X-Gm-Message-State: AOAM533TKgvo9Tco6RgVLr3cpwpEzmanOu5Y/+vey6QZJb9bYWS7lP6y MbD4RPz93i0vrL8EsoldNZVIQOCAhKU2zIRsIw== X-Google-Smtp-Source: ABdhPJxtOF1b5CW5lvRNV2ebg6deZI74Or1G55/UUcPeCuV9l855p9+7HVEue6vff1jUgZAonnr5ag4iTfNMWodgLQ== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:234d:700a:7e48:613e]) (user=almasrymina job=sendgmr) by 2002:a05:6214:21ce:: with SMTP id d14mr10874416qvh.47.1621583077485; Fri, 21 May 2021 00:44:37 -0700 (PDT) Date: Fri, 21 May 2021 00:44:33 -0700 Message-Id: <20210521074433.931380-1-almasrymina@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.31.1.818.g46aad6cb9e-goog Subject: [PATCH v3] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY From: Mina Almasry Cc: Mina Almasry , Axel Rasmussen , Peter Xu , linux-mm@kvack.org, Mike Kravetz , Andrew Morton , linux-kernel@vger.kernel.org Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=GMrWWk4R; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 35WSnYAsKCFg0BC0IHOC8D06EE6B4.2ECB8DKN-CCAL02A.EH6@flex--almasrymina.bounces.google.com designates 209.85.222.202 as permitted sender) smtp.mailfrom=35WSnYAsKCFg0BC0IHOC8D06EE6B4.2ECB8DKN-CCAL02A.EH6@flex--almasrymina.bounces.google.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E5D66FB X-Stat-Signature: d4rkrnozrzt6uk9mppbxdowosidypeaw X-HE-Tag: 1621583074-220496 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The userfaultfd hugetlb tests detect a resv_huge_pages underflow. This happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on an index for which we already have a page in the cache. When this happens, we allocate a second page, double consuming the reservation, and then fail to insert the page into the cache and return -EEXIST. To fix this, we first if there exists a page in the cache which already consumed the reservation, and return -EEXIST immediately if so. Secondly, if we fail to copy the page contents while holding the hugetlb_fault_mutex, we will drop the mutex and return to the caller after allocating a page that consumed a reservation. In this case there may be a fault that double consumes the reservation. To handle this, we free the allocated page, fix the reservations, and allocate a temporary hugetlb page and return that to the caller. When the caller does the copy outside of the lock, we again check the cache, and allocate a page consuming the reservation, and copy over the contents. Test: Hacked the code locally such that resv_huge_pages underflows produce a warning and the copy_huge_page_from_user() always fails, then: ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success Both tests succeed and produce no warnings. After the test runs number of free/resv hugepages is correct. Signed-off-by: Mina Almasry Cc: Axel Rasmussen Cc: Peter Xu Cc: linux-mm@kvack.org Cc: Mike Kravetz Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- include/linux/hugetlb.h | 4 ++ mm/hugetlb.c | 103 ++++++++++++++++++++++++++++++++++++---- mm/migrate.c | 39 +++------------ 3 files changed, 103 insertions(+), 43 deletions(-) -- 2.31.1.818.g46aad6cb9e-goog diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b92f25ccef58..427974510965 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -194,6 +194,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, bool is_hugetlb_entry_migration(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); +void hugetlb_copy_page(struct page *dst, struct page *src); + #else /* !CONFIG_HUGETLB_PAGE */ static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) @@ -379,6 +381,8 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm, static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { } +static inline void hugetlb_copy_page(struct page *dst, struct page *src); + #endif /* !CONFIG_HUGETLB_PAGE */ /* * hugepages at page global directory. If arch support diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 629aa4c2259c..cb041c97a558 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -81,6 +81,45 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); +/* + * Gigantic pages are so large that we do not guarantee that page++ pointer + * arithmetic will work across the entire page. We need something more + * specialized. + */ +static void __copy_gigantic_page(struct page *dst, struct page *src, + int nr_pages) +{ + int i; + struct page *dst_base = dst; + struct page *src_base = src; + + for (i = 0; i < nr_pages;) { + cond_resched(); + copy_highpage(dst, src); + + i++; + dst = mem_map_next(dst, dst_base, i); + src = mem_map_next(src, src_base, i); + } +} + +void hugetlb_copy_page(struct page *dst, struct page *src) +{ + int i; + struct hstate *h = page_hstate(src); + int nr_pages = pages_per_huge_page(h); + + if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) { + __copy_gigantic_page(dst, src, nr_pages); + return; + } + + for (i = 0; i < nr_pages; i++) { + cond_resched(); + copy_highpage(dst + i, src + i); + } +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -4868,19 +4907,20 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct page **pagep) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); - struct address_space *mapping; - pgoff_t idx; + struct hstate *h = hstate_vma(dst_vma); + struct address_space *mapping = dst_vma->vm_file->f_mapping; + pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); unsigned long size; int vm_shared = dst_vma->vm_flags & VM_SHARED; - struct hstate *h = hstate_vma(dst_vma); pte_t _dst_pte; spinlock_t *ptl; - int ret; + int ret = -ENOMEM; struct page *page; int writable; - - mapping = dst_vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, dst_vma, dst_addr); + struct mempolicy *mpol; + nodemask_t *nodemask; + gfp_t gfp_mask = htlb_alloc_mask(h); + int node = huge_node(dst_vma, dst_addr, gfp_mask, &mpol, &nodemask); if (is_continue) { ret = -EFAULT; @@ -4888,7 +4928,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (!page) goto out; } else if (!*pagep) { - ret = -ENOMEM; + /* If a page already exists, then it's UFFDIO_COPY for + * a non-missing case. Return -EEXIST. + */ + if (hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + ret = -EEXIST; + goto out; + } + page = alloc_huge_page(dst_vma, dst_addr, 0); if (IS_ERR(page)) goto out; @@ -4900,12 +4947,48 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* fallback to copy_from_user outside mmap_lock */ if (unlikely(ret)) { ret = -ENOENT; + /* Free the allocated page which may have + * consumed a reservation. + */ + restore_reserve_on_error(h, dst_vma, dst_addr, page); + if (!HPageRestoreReserve(page)) { + if (unlikely(hugetlb_unreserve_pages( + mapping->host, idx, idx + 1, 1))) + hugetlb_fix_reserve_counts( + mapping->host); + } + put_page(page); + + /* Allocate a temporary page to hold the copied + * contents. + */ + page = alloc_migrate_huge_page(h, gfp_mask, node, + nodemask); + if (IS_ERR(page)) { + ret = -ENOMEM; + goto out; + } *pagep = page; - /* don't free the page */ + /* Set the outparam pagep and return to the caller to + * copy the contents outside the lock. Don't free the + * page. + */ goto out; } } else { - page = *pagep; + if (hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + put_page(*pagep); + ret = -EEXIST; + goto out; + } + + page = alloc_huge_page(dst_vma, dst_addr, 0); + if (IS_ERR(page)) { + ret = -ENOMEM; + goto out; + } + __copy_gigantic_page(page, *pagep, pages_per_huge_page(h)); + put_page(*pagep); *pagep = NULL; } diff --git a/mm/migrate.c b/mm/migrate.c index 6b37d00890ca..d3437f9a608d 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -528,28 +528,6 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, return MIGRATEPAGE_SUCCESS; } -/* - * Gigantic pages are so large that we do not guarantee that page++ pointer - * arithmetic will work across the entire page. We need something more - * specialized. - */ -static void __copy_gigantic_page(struct page *dst, struct page *src, - int nr_pages) -{ - int i; - struct page *dst_base = dst; - struct page *src_base = src; - - for (i = 0; i < nr_pages; ) { - cond_resched(); - copy_highpage(dst, src); - - i++; - dst = mem_map_next(dst, dst_base, i); - src = mem_map_next(src, src_base, i); - } -} - static void copy_huge_page(struct page *dst, struct page *src) { int i; @@ -557,19 +535,14 @@ static void copy_huge_page(struct page *dst, struct page *src) if (PageHuge(src)) { /* hugetlbfs page */ - struct hstate *h = page_hstate(src); - nr_pages = pages_per_huge_page(h); - - if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) { - __copy_gigantic_page(dst, src, nr_pages); - return; - } - } else { - /* thp page */ - BUG_ON(!PageTransHuge(src)); - nr_pages = thp_nr_pages(src); + hugetlb_copy_page(dst, src); + return; } + /* thp page */ + BUG_ON(!PageTransHuge(src)); + nr_pages = thp_nr_pages(src); + for (i = 0; i < nr_pages; i++) { cond_resched(); copy_highpage(dst + i, src + i);