From patchwork Thu Feb 27 04:06:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 11407717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE8C2924 for ; Thu, 27 Feb 2020 04:06:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 65762222C2 for ; Thu, 27 Feb 2020 04:06:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="a48cfcMl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 65762222C2 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 93E046B0003; Wed, 26 Feb 2020 23:06:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8C5F46B0005; Wed, 26 Feb 2020 23:06:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 766276B0006; Wed, 26 Feb 2020 23:06:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205]) by kanga.kvack.org (Postfix) with ESMTP id 594B76B0003 for ; Wed, 26 Feb 2020 23:06:55 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 26A8321E4 for ; Thu, 27 Feb 2020 04:06:55 +0000 (UTC) X-FDA: 76534571190.02.sugar03_3c553539c3225 X-Spam-Summary: 2,0,0,096ea810b270cc78,d41d8cd98f00b204,hughd@google.com,,RULES_HIT:1:2:41:69:355:379:800:960:966:968:973:988:989:1260:1277:1313:1314:1345:1437:1516:1518:1593:1594:1605:1730:1747:1777:1792:2194:2196:2199:2200:2393:2553:2559:2562:2693:2898:2899:3138:3139:3140:3141:3142:3152:3865:3866:3867:3868:3870:3871:3872:3874:4052:4250:4321:4385:5007:6119:6120:6261:6653:7652:7875:7901:7903:8603:8660:8957:9108:9592:10004:11026:11232:11473:11658:11914:12043:12291:12295:12296:12297:12438:12517:12519:12555:12663:12679:12683:12740:12895:12986:13141:13148:13161:13229:13230:13439:14096:14097:14394:14659:21060:21063:21080:21324:21433:21444:21451:21611:21627:21740:21795:21796:21939:21990:30003:30034:30036:30051:30054:30070:30090,0,RBL:209.85.215.169:@google.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:27,LUA_SUMMARY:none X-HE-Tag: sugar03_3c553539c3225 X-Filterd-Recvd-Size: 12107 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Thu, 27 Feb 2020 04:06:54 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id a14so733054pgb.11 for ; Wed, 26 Feb 2020 20:06:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=K44v4WdoumEJk1/WBeNAEHkbegFJCvzfu3JNHZN30yk=; b=a48cfcMlHVlcHuap8OST8KTBjqWyXyWHF9TwBIjO1ccH6dcND12zVT6ad6wl9X7M0c mjfP0mUGYFFN+bnO2/BtKsBVNSEJNrguNrm599Gzo3ZP5spgXEea57lY7dYc1exYY4Yj 6q5s4GZv0vNDVQK6AFt+s/+yv5FcjES+wLj/hL1+y/E87O65WvU4Tg8m2Yt9qL44XXwN gztOmVzpGgEfueou7TCxLr5l8h0y63d8iewzgoAcjl/dDjp8j1Lj2mjSEGcBstQwnZAi w/dBtxbydSUXw3FnAeIyIrenA4/b7GJKN1JBt1Acs85wd5oorOqdrMMSh9AJMGj2b13m HAIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=K44v4WdoumEJk1/WBeNAEHkbegFJCvzfu3JNHZN30yk=; b=AxWk+YRuMU6n0MVGNabxBq2Qrlb7LgwukCpU1ZfBNWtmJkEELOezpnQACJR5y8y/M5 SOE+y14upVolQJJ9ndoaQUioaCTVgjM+Ox1CjdnEuWDGM/HDNNjG3Jj9DLV6hmKDqBa6 PoRWwL/iBxjmxYW2YkxUE3tFyq2Iv2j97Ojj3dSwlUhqaihd/VGmq/+SZ/Sd6Jqq+zcH JqcQ4m//3k1u/vstUAfUEVkbdDq6qosISSZKxSnSJ62ddhJB4Gi+mAy95PR9DH0zl0jc 304CTw1jBjzXWI2ob6HOc4HJQQZuXomUSY2jLwFVUpjmFX0tnQgq9DJ/LheAtmIVueJl kA5g== X-Gm-Message-State: APjAAAXBpHKGtNQTwUniV/xD6AYEFIkaodk16XMsbsoFV6sKaoIdvmem kh8yWjTVbzo+Gjaph9QkGtYnDA== X-Google-Smtp-Source: APXvYqyXGkgB1NgAJbGQBgunozIYtHC7RAQ/ZpaOA3sx8UM3+Gxv1O1h5ZvT8tUcaibJ2cInm4Xdkw== X-Received: by 2002:a63:450b:: with SMTP id s11mr1981040pga.45.1582776412718; Wed, 26 Feb 2020 20:06:52 -0800 (PST) Received: from [100.112.92.218] ([104.133.9.106]) by smtp.gmail.com with ESMTPSA id z27sm4913806pfj.107.2020.02.26.20.06.51 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 26 Feb 2020 20:06:51 -0800 (PST) Date: Wed, 26 Feb 2020 20:06:33 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: Yang Shi , Hugh Dickins , Alexander Duyck , "Michael S. Tsirkin" , David Hildenbrand , "Kirill A. Shutemov" , Matthew Wilcox , Andrea Arcangeli , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] huge tmpfs: try to split_huge_page() when punching hole Message-ID: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Yang Shi writes: Currently, when truncating a shmem file, if the range is partly in a THP (start or end is in the middle of THP), the pages actually will just get cleared rather than being freed, unless the range covers the whole THP. Even though all the subpages are truncated (randomly or sequentially), the THP may still be kept in page cache. This might be fine for some usecases which prefer preserving THP, but balloon inflation is handled in base page size. So when using shmem THP as memory backend, QEMU inflation actually doesn't work as expected since it doesn't free memory. But the inflation usecase really needs to get the memory freed. (Anonymous THP will also not get freed right away, but will be freed eventually when all subpages are unmapped: whereas shmem THP still stays in page cache.) Split THP right away when doing partial hole punch, and if split fails just clear the page so that read of the punched area will return zeroes. Hugh Dickins adds: Our earlier "team of pages" huge tmpfs implementation worked in the way that Yang Shi proposes; and we have been using this patch to continue to split the huge page when hole-punched or truncated, since converting over to the compound page implementation. Although huge tmpfs gives out huge pages when available, if the user specifically asks to truncate or punch a hole (perhaps to free memory, perhaps to reduce the memcg charge), then the filesystem should do so as best it can, splitting the huge page. That is not always possible: any additional reference to the huge page prevents split_huge_page() from succeeding, so the result can be flaky. But in practice it works successfully enough that we've not seen any problem from that. Add shmem_punch_compound() to encapsulate the decision of when a split is needed, and doing the split if so. Using this simplifies the flow in shmem_undo_range(); and the first (trylock) pass does not need to do any page clearing on failure, because the second pass will either succeed or do that clearing. Following the example of zero_user_segment() when clearing a partial page, add flush_dcache_page() and set_page_dirty() when clearing a hole - though I'm not certain that either is needed. But: split_huge_page() would be sure to fail if shmem_undo_range()'s pagevec holds further references to the huge page. The easiest way to fix that is for find_get_entries() to return early, as soon as it has put one compound head or tail into the pagevec. At first this felt like a hack; but on examination, this convention better suits all its callers - or will do, if the slight one-page-per-pagevec slowdown in shmem_unlock_mapping() and shmem_seek_hole_data() is transformed into a 512-page-per-pagevec speedup by checking for compound pages there. Signed-off-by: Hugh Dickins --- mm/filemap.c | 14 ++++++- mm/shmem.c | 98 +++++++++++++++++++++---------------------------- mm/swap.c | 4 ++ 3 files changed, 60 insertions(+), 56 deletions(-) --- 5.6-rc3/mm/filemap.c 2020-02-09 17:36:41.758976480 -0800 +++ linux/mm/filemap.c 2020-02-25 20:08:13.178755732 -0800 @@ -1697,6 +1697,11 @@ EXPORT_SYMBOL(pagecache_get_page); * Any shadow entries of evicted pages, or swap entries from * shmem/tmpfs, are included in the returned array. * + * If it finds a Transparent Huge Page, head or tail, find_get_entries() + * stops at that page: the caller is likely to have a better way to handle + * the compound page as a whole, and then skip its extent, than repeatedly + * calling find_get_entries() to return all its tails. + * * Return: the number of pages and shadow entries which were found. */ unsigned find_get_entries(struct address_space *mapping, @@ -1728,8 +1733,15 @@ unsigned find_get_entries(struct address /* Has the page moved or been split? */ if (unlikely(page != xas_reload(&xas))) goto put_page; - page = find_subpage(page, xas.xa_index); + /* + * Terminate early on finding a THP, to allow the caller to + * handle it all at once; but continue if this is hugetlbfs. + */ + if (PageTransHuge(page) && !PageHuge(page)) { + page = find_subpage(page, xas.xa_index); + nr_entries = ret + 1; + } export: indices[ret] = xas.xa_index; entries[ret] = page; --- 5.6-rc3/mm/shmem.c 2020-02-09 17:36:41.798976778 -0800 +++ linux/mm/shmem.c 2020-02-25 20:08:13.182755758 -0800 @@ -789,6 +789,32 @@ void shmem_unlock_mapping(struct address } /* + * Check whether a hole-punch or truncation needs to split a huge page, + * returning true if no split was required, or the split has been successful. + * + * Eviction (or truncation to 0 size) should never need to split a huge page; + * but in rare cases might do so, if shmem_undo_range() failed to trylock on + * head, and then succeeded to trylock on tail. + * + * A split can only succeed when there are no additional references on the + * huge page: so the split below relies upon find_get_entries() having stopped + * when it found a subpage of the huge page, without getting further references. + */ +static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end) +{ + if (!PageTransCompound(page)) + return true; + + /* Just proceed to delete a huge page wholly within the range punched */ + if (PageHead(page) && + page->index >= start && page->index + HPAGE_PMD_NR <= end) + return true; + + /* Try to split huge page, so we can truly punch the hole or truncate */ + return split_huge_page(page) >= 0; +} + +/* * Remove range of pages and swap entries from page cache, and free them. * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate. */ @@ -838,31 +864,11 @@ static void shmem_undo_range(struct inod if (!trylock_page(page)) continue; - if (PageTransTail(page)) { - /* Middle of THP: zero out the page */ - clear_highpage(page); - unlock_page(page); - continue; - } else if (PageTransHuge(page)) { - if (index == round_down(end, HPAGE_PMD_NR)) { - /* - * Range ends in the middle of THP: - * zero out the page - */ - clear_highpage(page); - unlock_page(page); - continue; - } - index += HPAGE_PMD_NR - 1; - i += HPAGE_PMD_NR - 1; - } - - if (!unfalloc || !PageUptodate(page)) { - VM_BUG_ON_PAGE(PageTail(page), page); - if (page_mapping(page) == mapping) { - VM_BUG_ON_PAGE(PageWriteback(page), page); + if ((!unfalloc || !PageUptodate(page)) && + page_mapping(page) == mapping) { + VM_BUG_ON_PAGE(PageWriteback(page), page); + if (shmem_punch_compound(page, start, end)) truncate_inode_page(mapping, page); - } } unlock_page(page); } @@ -936,43 +942,25 @@ static void shmem_undo_range(struct inod lock_page(page); - if (PageTransTail(page)) { - /* Middle of THP: zero out the page */ - clear_highpage(page); - unlock_page(page); - /* - * Partial thp truncate due 'start' in middle - * of THP: don't need to look on these pages - * again on !pvec.nr restart. - */ - if (index != round_down(end, HPAGE_PMD_NR)) - start++; - continue; - } else if (PageTransHuge(page)) { - if (index == round_down(end, HPAGE_PMD_NR)) { - /* - * Range ends in the middle of THP: - * zero out the page - */ - clear_highpage(page); - unlock_page(page); - continue; - } - index += HPAGE_PMD_NR - 1; - i += HPAGE_PMD_NR - 1; - } - if (!unfalloc || !PageUptodate(page)) { - VM_BUG_ON_PAGE(PageTail(page), page); - if (page_mapping(page) == mapping) { - VM_BUG_ON_PAGE(PageWriteback(page), page); - truncate_inode_page(mapping, page); - } else { + if (page_mapping(page) != mapping) { /* Page was replaced by swap: retry */ unlock_page(page); index--; break; } + VM_BUG_ON_PAGE(PageWriteback(page), page); + if (shmem_punch_compound(page, start, end)) + truncate_inode_page(mapping, page); + else { + /* Wipe the page and don't get stuck */ + clear_highpage(page); + flush_dcache_page(page); + set_page_dirty(page); + if (index < + round_up(start, HPAGE_PMD_NR)) + start = index + 1; + } } unlock_page(page); } --- 5.6-rc3/mm/swap.c 2020-02-09 17:36:41.806976836 -0800 +++ linux/mm/swap.c 2020-02-25 20:08:13.182755758 -0800 @@ -1005,6 +1005,10 @@ EXPORT_SYMBOL(__pagevec_lru_add); * ascending indexes. There may be holes in the indices due to * not-present entries. * + * Only one subpage of a Transparent Huge Page is returned in one call: + * allowing truncate_inode_pages_range() to evict the whole THP without + * cycling through a pagevec of extra references. + * * pagevec_lookup_entries() returns the number of entries which were * found. */